Java and XML Basics, Part 3  Hot PDF Print E-mail
Tag it:
Delicious
Furl it!
Digg
NewsVine
Reddit
YahooMyWeb
Technorati
Articles Reviews XML
Written by Liviu Tudor   
Wednesday, 21 March 2007

{mos_sb_discuss:24} 

So far, during this series of articles (part 1, part 2) we've looked at DOM and SAX, and I suppose most of you are thinking which one of the two approaches is preferable? Well, there is no general rule of thumb, but this article might help you make the right decision when you’ll have to.

NOTE  Before you get started, you'll probably want to download the support file (70KB) which contains sample codes for all articles in this series up to this point.)



Performance Considerations

Those of you who read the previous article and manage not to fall asleep before the end will probably remember our little (silly) XML example that was parsed using both the DOM and the SAX approach. As we have seen, both of them achieved the same thing, which raises the obvious questions:

  • Why have 2 ways of doing the same thing?
  • Which one is the better one to use?

Oh yeah, there is always the 3rd question about the meaning of life --but we will leave that one for the time being. (It's more of a Linux programming question, anyway.)

The reason for having these two ways of doing things is due to the fact that the two standards come from different sources: as we stated before, DOM was produced by W3C while SAX came from David Megginson. Also, the two standards are based on two different paradigms -- while DOM produces a document tree at the end of the parsing process, which can then be manipulated/interrogated, by the user, SAX is based on the idea of events and leaves the whole data manipulation (and eventually building a document tree) to the user. (Not to mention of course that having just one way to do this would be rather boring -- a bit like having only one type of coffee and not being able to enjoy such delicacies as Cappuccino, or a Mocha Latte.)

The latter question though is not that easy to answer; while in most cases choosing in between SAX and DOM proves to be a matter of tastes with some programmers, we are going to look at some speed and performance comparisons in between the two of them. While there are complex ways to evaluate the performance of a computer program, we are only going to refer to two main factors:

    * Memory consumption
    * Execution speed

For this we have modified the SimpleDOMParser3.java and SimpleSAXParser6.java files and added some lines to measure the execution time and the memory usage as it can be seen:

SimpleDOMParser4.java


/**
 * Parse the document now
 */

System.gc();
long tmStart = System.currentTimeMillis();
long memStart = Runtime.getRuntime().freeMemory();
try
{
 doc = builder.parse( isXML );
}
catch( IOException ioe )
{
 System.err.println( "I/O error while reading from file:" );
 ioe.printStackTrace();
 System.exit( 5 );
}
catch( org.xml.sax.SAXException saxe )
{
 System.err.println( "Parsing error:" );
 saxe.printStackTrace();
 System.exit( 6 );
}
System.gc();
long tmEndParse = System.currentTimeMillis();
long memEnd = Runtime.getRuntime().freeMemory();
System.out.println( "Parsing took : " + (tmEndParse - tmStart) + " msec" );
System.out.println( "Memory occupied : " + (memStart - memEnd) + " bytes" );

SimpleSAXParser7.java



/**
 * Parse the document now
 */

System.gc();
long tmStart = System.currentTimeMillis();
long memStart = Runtime.getRuntime().freeMemory();
try
{
 sax.parse( isXML, new SimpleSAXParser7() );
}
catch( IOException ioe )
{
 System.err.println( "I/O error while reading from file:" );
 ioe.printStackTrace();
 System.exit( 6 );
}
catch( org.xml.sax.SAXException saxe )
{
 System.err.println( "Parsing error:" );
 saxe.printStackTrace();
 System.exit( 7 );
}
System.out.println();
System.gc();
long tmEnd = System.currentTimeMillis();
long memEnd = Runtime.getRuntime().freeMemory();
System.out.println( "Parsing took " + (tmEnd - tmStart) + " msec" );
System.out.println( "Memory occupied " + (memStart - memEnd) + " bytes" );


As you can see we don't do any fancy time estimation, just base everything on the computer clock and measure the number of milliseconds it takes for the process to finish -- of course in this process we don't just execute parsing code but a few other things as well (check for exceptions etc.), but we consider the time spent on executing these instructions as irrelevant -- and even if it were, nearly the same code appears in both files!

Also, to have an estimate of the memory occupied, we just compare the available free memory (in bytes) before and after the parsing. Again, in this process memory can be occupied not just by the variables and structures involved in parsing but by Java internal structures too (buffers, stacks, unclaimed memory blocks etc.).

To make the result look closer to reality, we are suggesting to the interpreter that the garbage collection should be kicked off before and right after the parsing -- this will give a more accurate figure of the memory available. Of course, there is no way to enforce garbaged memory recollection, however, as we are running in a single-user single-threaded non-stressful environment, we can assume that in most cases garbage collection will take place when we have queued up the request.

Read more


User reviews

There are no user reviews for this item.

Add new review




Powered by jReviews

Last Updated ( Sunday, 08 July 2007 )
 
< Prev   Next >