发新话题
打印

Java Sketchbook: The HTML Renderer Shootout, Part 1

Java Sketchbook: The HTML Renderer Shootout, Part 1

The last ten years of software development have seen the rise of
  the Internet and open standards, most prominently HTML. To most
  non-technical people, web pages (just HTML over a TCP/IP connection)
  are The Internet. And now HTML is so pervasive, its usefulness
  outweighing its flaws, that we find it in many applications that aren't
  strictly web browsers. Chat programs, help files, and even a certain online music store are all built on top of the flexibility and ubiquity of HTML.
  There's just one problem with HTML, though. To display it, you need some
  kind of web browser, usually a component that actually renders the HTML on
  screen. For Java developers, a good HTML renderer can be hard to come
  by. The built-in viewer, the Swing HTMLEditorKit, is quite lacking, and there
  aren't many alternatives. However, the situation isn't as bleak as you
  might think; there are other renderers out there, we just have to look
  harder. In this article, we will review 11 different HTML renderers,
  comparing their features, compliance, and speed; searching for the best one for any project. Part one will consider free (as in either "speech" or "beer") products, while part two will consider
  licensed commercial offerings.
  What Features Are We Looking For?
  When deciding how to rate each renderer, we should consider why we need one. What do we need to do with it? HTML is essentially styled text and images, loaded over a network, with hyperlinks. Java, often called the networked programming language, works with all sorts of network components, including URLs, quite easily. So the key point we are lacking is styled text. HTML (and by HTML I mean HTML, XHTML, and CSS 1/2) has become the standard for styled text. And it's everywhere.
  As processor speed and display quality have increased during the last
  ten years, more and more applications have some form of styled text in them,
  either for editing or display. A quick look through my Start menu turns up
  the following: Outlook (HTML email and the "Outlook Today" screen), Media
  Player (advertising and shortcuts), iTunes (the music store), File Explorer
  (the stylized sidebar), Trillian chat (for message display), the Address
  Book, Microsoft Office (Word, Excel, and PowerPoint), and the Palm Desktop. This
  list doesn't even include the styled wizard text and help files for
  virtually every application on my computer. These are all programs that
  don't really have anything to do with HTML. If we count programs that in
  some way edit or produce HTML, then I've got my editor, jEdit, Photoshop,
  Flash, and iPhoto. The common thread between all of these is that they have
  styled text that could be (and often is) HTML.
  The other thing these programs have in common is that they don't view
  normal pages on the Web. They each have specific functions, and the HTML
  they use is tailored to that function. The browser in iTunes only has to
  display the HTML coming out of Apple's music database, not the HTML of the average
  broken web page out there. For that reason the first criteria for our
  roundup will be "an adherence to standards, as modern standards as possible." This
  principally means full XHTML support with as much of CSS1 and CSS2 as
  possible. We want to use fewer table hacks and more divs with style. Being
  able to show malformed HTML on the Web is nice, but not essential. The most
  important thing is that we can get attractive display using standard
  mechanisms. To test this, we will run the browsers against an XHTML and CSS2
  site known to push the envelope while being compliant, the CSS Zen Garden, a showcase for the possibilities of CSS-only style. To measure compliance with older web sites
  (which may be required for some applications), will we also run against the front pages of Amazon and Slashdot, since these are two heavily used web sites with a good mixture of text and graphics.
  Each product we survey will have figures showing how these three sites
  are rendered. The small figure shown in the page links to an image of
  a full-size browser window, so you can get a complete picture of how
  the browser handles layout, images, blocks of text, etc.
  Next, we care about speed. A lot of our non-traditional uses for HTML
  only require small portions of pages (such as a chat program's message display)
  but speed still matters. It's especially important for larger text blocks
  such as help files and book readers. To test speed we will use a copy of
  Shakespeare's Hamlet (from ClassicReader.com)
  formatted as one gigantic HTML file. The styling is simple, but it's a large file (over 10,000 lines) to parse into memory and scroll.
  We won't test JavaScript, Flash, or applets because most of our embedded
  browsers give us direct programmatic control from Java. Plus, the back end
  for the HTML is often our application itself, which reduces the need for
  validation or content generation. Some of the browsers below do support
  JavaScript, though, and I will make a mention when they do. More important
  is how hackable they are. How much can we control or change from the Java
  side? Can we capture click events or trigger pop-up menus? Can we extend the
  rendering at all? This will all be under the heading "Hackability."
  The final condition for this article is that there must a freely
  downloadable demo. Some of the commercial products we'll see in part two
  come with licensing fees, but they all have something you can download right now and
  try out. I've also added the condition that there must have been some
  update to the package, or at least the web page, in the last year. There are a lot of
  dead projects with questionable status out there that we want to
  avoid.
  The Types of Renderers
  There are two types of HTML renderers: 100 percent Java and native
  wrappers. The 100-percent-Java renderers are just what they sound like, HTML
  renderers written completely in Java without calls to any native libraries.
  They have the advantage of being portable to almost anywhere, depending
  only on the standard JRE libraries (usually Swing). The second type are
  actually wrappers to a native platform web browser like Internet Explorer
  or Mozilla. They have the advantage of using a fast and reliable browser
  that can handle virtually any HTML you throw at it. The downside is that you may
  be tied to one platform and there is less opportunity for hacking the
  display from Java. Plus, loading a full web browser may be overkill (and
  slow) for something like a chat program.
  The license is another a distinguishing feature between these renderers.
  Some are open source or at least available for no cost. Some are free for
  non-commercial use, and some require licensing fees. Depending on your needs,
  one type may be preferable to another, so be sure to read the actual
  license before you decide.
  On the rendering tests, we will use a recent build of Mozilla Firebird as
  our control program. Figures 1, 2, and 3 show Mozilla viewing Slashdot, Amazon, and The
  ZenGarden.
  
  Figure 1. Amazon in Mozilla (You can click on the screen shot to open a full-size view.)
  
  Figure 2. Slashdot in Mozilla (You can click on the screen shot to open a full-size view.)
  
  Figure 3. CSS Zen Garden in Mozilla (You can click on the screen shot to open a full-size view.)
  Free HTML Renderers
  The Swing HTMLEditorKit
  Company: Sun Microsystems
  License: Part of the standard JRE
  URL: java.sun.com/j2se/1.4.2/docs/api/index.html
  Type: 100 percent Java
  Our first renderer is the venerable Swing HTMLEditorKit. Though it has a bad rap, it a lot better than it used to be. Recent revisions (I tested using the Java 1.4.2 JDK) have added preliminary XHTML and CSS support, though it
  still fails on a lot of complicated web sites. Since it's just a subclass of
  JEditorPane, it can integrate easily with any application, and the use of
  Views and Documents from javax.swing.text gives it a high hackability factor. Most
  importantly, it's included with every Java Runtime, so you can depend on it
  being there. Its one downside is that while you can view the source and
  modify it for your own use, you can't recompile it and distribute it to others along with
  your application. I'm not a big fan of the idea that we need to
  open source Java, but I do think that there would be a lot to gain from
  open sourcing the HTML component (or perhaps all of Swing).
  Here's how our three tested pages look with the HTMLEditorKit (Figures 4, 5, and 6).
  
  Figure 4. Amazon in HTMLEditorKit
  
  Figure 5. Slashdot in HTMLEditorKit
  
  Figure 6. CSS Zen Garden in HTMLEditorKit
  Not too bad. The HTMLEditorKit clearly has some issues with horizontal tables, but it's passable. There is almost no modern CSS support, but it shows the degraded version of the Zen Garden properly (the @import hack notwithstanding). If you use it, be sure to call setEditable(false) on your JEditorPane, or else all of the
  script tags will be visible. Speedwise, the HTMLEditorKit pulled up Hamlet in about one second, no slower than Mozilla, so it's pretty speedy with large amounts of text, at least.
  All in all, I would say that the HTMLEditorKit's presence in the JRE trumps its failings, and if you can work around its CSS bugs, then use it. It's probably best used in applications with simple styling, such as chat programs or help windows. I wouldn't use it for web previews or anything where you want lots of graphics or tricky alignment.
  Modern Compliance: Virtually none
  Legacy Web: Passable
  JavaScript: None
  Hackability: Lots
  Speed: Pretty good
  JRex
  Company: MozDev
  License: Mozilla Public License
  URL: jrex.mozdev.org
  Type: Native Wrapper
  JRex is a complete wrapper for Mozilla. It is still very much under
  development, but shows real potential. I was not able to get it to work
  with Mozilla Firebird, but it worked flawlessly with Mozilla 1.4. I'm
  guessing that this is just a version issue and hopefully will be worked out
  soon. Since it uses Mozilla underneath, the rendering and JavaScript
  support is perfect. Plugins are also supported except, strangely, the Java
  Plugin for Windows.
  In terms of hackability, JRex stacks up pretty well. There are APIs to
  receive events and direct DOM access is under development. Since this is
  Mozilla, we also get support for XUL, which may be useful for some
  developers. My only real complaints are the problems dealing with version
  issues, and lack of a way to auto-detect an existing installation of Mozilla.
  However, since you can simply include an entire copy of Mozilla (about 5MB of DLLs and binaries) with your application, this may not be as much of an
  issue.
  For people who need to embed a true browser into an application, either
  for general websurfing or proofing in a dev tool, I recommend JRex. And
  since it's still a work in progress, if you are an open source developer
  looking for a project to contribute to, this is one to consider. In particular,
  one of the leaders mentioned wanting contributors with "knowledge of XPCOM, SWING/AWT, and JNI."
  He also said that "knowing JUNIT would be an added advantage."
  Figures 7, 8, and 9 show JRex's handling of our sample sites:
  
  Figure 7. Amazon in JRex (You can click on the screen shot to open a full-size view.)
  
  Figure 8. Slashdot in JRex (You can click on the screen shot to open a full-size view.)
  
  Figure 9. CSS Zen Garden in JRex (You can click on the screen shot to open a full-size view.)
  Modern Compliance: Excellent
  Legacy Web: Excellent
  JavaScript: Excellent
  Hackability: Pretty good
  Speed: Excellent
  Multivalent
  Company: UC Berkeley's Digital Library Project
  License: Open source (GPL)
  URL: multivalent.sourceforge.net
  Type: 100 percent Java
  Multivalent is an interesting research web browser. Meant primarily
  for browsing documentation, its HTML features are a bit behind. It rendered
  Amazon pretty well, but showed only the unstyled version of the Zen Garden.
  It loaded Hamlet reasonably fast, but nothing spectacular. Strangely, I
  couldn't get it to load Slashdot. I kept getting GZip errors, but that
  may stem from some strange headers on Slashdot's front page. Multivalent
  supports complete visual and behavioral customization. Plus, since it's
  open source, you can always start banging on the code. It does have some
  interesting features, such as lenses for magnifying the screen, full text
  searching, on-screen annotation, PDF support, on-the-fly decompression, and
  a speed-reading mode.
  See Figures 10, 11, and 12 for a look at Multivalent.
  
  Figure 10. Amazon in Multivalent (You can click on the screen shot to open a full-size view.)
  
  Figure 11. Slashdot in Multivalent (You can click on the screen shot to open a full-size view.)
  
  Figure 12. CSS Zen Garden in Multivalent (You can click on the screen shot to open a full-size view.)
  Modern Compliance: Virtually none
  Legacy Web: Poor
  JavaScript: None
  Hackability: Good
  Speed: Good
  Jazilla
  Company: Matt McBride
  License: Open source (MPL)
  URL: jazilla.mcbridematt.dhs.org/
  Type: 100 percent Java
  Jazilla is a resurrection of the Javagator, Netscape's Navigator-in-Java project
  started before they open sourced Mozilla in 1998. Speed is poor, and the rendering for
  general web sites is almost unusable. Since it's based on so much legacy
  code, it will probably never support modern features such as CSS2. Still, it
  can be useful for certain things, especially where a small-footprint
  browser is required (such as a chat application).
  Figure 13, 14, and 15 show Jazilla in action (or, perhaps, Jazilla inaction).
  
  Figure 13. Amazon in Jazilla (You can click on the screen shot to open a full-size view.)
  
  Figure 14. Slashdot in Jazilla (You can click on the screen shot to open a full-size view.)
  
  Figure 15. CSS Zen Garden in Jazilla (You can click on the screen shot to open a full-size view.)
  Modern Compliance: Poor
  Legacy Web: Poor
  JavaScript: None
  Hackability: Some
  Speed: Slow
  CalPane
  Company: Andrew Moulden
  License: Free for non-commercial and some commercial apps.
  URL: www.netcomuk.co.uk/~offshore/index.html
  Type: 100 percent Java
  CalPane is an older browser without JavaScript support, but it can
  render legacy HTML fairly well. As you can see in the screenshots below,
  both Amazon and Slashdot render pretty well, though the lack of
  anti-aliasing is especially apparent on Slashdot. It doesn't support CSS at
  all, though it does degrade properly. This also highlights the principle of
  using CSS properly so that sites are still usable without it.
  As far as speed goes, it is pretty snappy on pages it supports. There is
  support for event callbacks, and you can override certain features such as how
  images are loaded, but there isn't too much hackability. In the long run,
  the lack of CSS and XHTML means that more and more sites will fail in CalPane.
  The greatest problem with CalPane is that its site doesn't appear to have been updated
  since 2002. I bent my own rule and included it in this roundup
  because the renderer is perfectly usable in its current state, as seen
  in Figures 16, 17, and 18.
  
  Figure 16. Amazon in CalPane (You can click on the screen shot to open a full-size view.)
  
  Figure 17. Slashdot in CalPane (You can click on the screen shot to open a full-size view.)
  
  Figure 18. CSS Zen Garden in CalPane (You can click on the screen shot to open a full-size view.)
  Modern Compliance: None
  Legacy Web: Decent
  JavaScript: None
  Hackability: Decent
  Speed: Good
  Conclusions
  Overall, our choices are a very mixed bag. JRex offers high compliance and
  speed, but requires integration with native code. The 100-percent-Java renderers have
  little support for modern standards, but some (Calpane and Multivalent) can at
  least render some popular pages accurately.
  In part two of this series, we'll take a look at what commercial HTML renderers can do, and we'll
  collect some other renderers that didn't make the cut for this survey but might yet
  find their place.
  Joshua Marinacci first tried Java in 1995 at the request of his favorite TA and has never look
发新话题