| By Huang Chang Hao | Article Rating: |
|
| January 2, 2009 07:15 AM EST | Reads: |
2,954 |
Figure 3 shows the CPU usage of our proposed new solution. The time span is 2 minutes and 56 seconds.
XmlDocument versus XPathDocument
The major differences between XmlDocument and XPathDocument are:
- XmlDocument is editable, while XPathDocument is read-only
- They are based on different data models.
XmlDocument is based on the W3C XML DOM, which is an object model that basically covers all XML syntaxes, including low-level syntax sugar such as entities, CDATA sections, DTD, notations, etc. That's a document-centric model and it allows for full fidelity when loading/saving XML documents.
XPathDocument is based on an XPath 1.0 data model that is a read-only XML Infoset-compatible data-centric object model that covers only semantically significant parts of XML, leaving out insignificant syntax details - no DTD, no entities, no CDATA, no adjacent text nodes, only significant data expressed as a tree with seven types of nodes. Simple and lightweight. That's why XPathDocument is a preferred data store for read-only scenarios, especially with XPath or XSLT involved.
Converting XmlDocument into XPathDocument
There are a wide variety of ways to convert XmlDocument to XPathDocument. I tried two of them. These two methods are very straightforward. Their general algorithms are:
Method 1:
- Save XmlDocument into a temporary disk file
- Create XPathDocument with the XML file above
- Delete that temporary file
Method 2:
- Save XmlDocument into a MemoryStream
- Create XPathDocument with the above MemoryStream
For implementation details refer to the source code in the Appendix.
I tested these two methods with a dummy XML file that was 11,267,545 bytes. This is a huge XML file. It's not likely to happen in real production. We got the result shown in Table 2:
Memory consumption for both methods were almost equal.
In Figures 4 and 5, the red line shows the CPU and memory use of Method 1, and the blue line shows the CPU and Memory use of Method 2.
Conclusion and Recommendations
As you can see, both Solution 2 and 3, which implement XPathDocument, are about 30 times faster than the current solution that implements XmlDocument to get the same result. So XPathDocument is recommended.
If an update/modification of XML is required, XmlDocument can be used first before the program reaches the XPath query part. XmlDocument can be converted to XPathDocument then the program proceeds to do an XPath query.
Below is a piece of sample code in C# showing how to convert XMLDocument to XPathDocument.
using System.Xml.XPath;
using System.IO;
// variables definition
XPathDocument xpathDoc = null;
XmlDocument xmlDoc = null;
// xml file and xpath query
const string FILE = "<xml file path goes here>";
// load xml file, initialize XmlDocument
xmlDoc = new XmlDocument( FILE );
// save XmlDocument into a memory stream
MemoryStream memStream = new MemoryStream();
xmlDoc.Save( memStream );
memStream.Position = 0;
// create XPathDocument with memory stream
xpathDoc = new XPathDocument( memStream );
Solution 2 is our proposed solution. Compared to Solution 3, the Evaluate method supports more features, such as the XPath function call, while Select doesn't. See the Appendix for the implementation details of Solution 2.
Below is a piece of sample code in C# showing how to query XML by XPathDocument.
using System.Xml.XPath;
// variables definition
XPathDocument xpathDoc = null;
XPathNavigator nav = null;
XPathExpression expression = null;
XPathNodeIterator iterator = null;
// xml file and xpath query
const string FILE = "<xml file path goes here>";
const string XPATH = "<xpath string goes here>";
xpathDoc = new XPathDocument( FILE );
nav = xpathDoc.CreateNavigator();
expression = nav.Compile( XPATH );
iterator = (XPathNodeIterator) nav.Evaluate( expression );
while( iterator.MoveNext() )
{
// gets value here
}
Proof of Our Recommendation
To prove that our recommended solution is practical, I repeated this test in a different hardware environment, more like a powerful server in a real production environment.
The hardware and software configuration was:
- Intel Xeon CPU 3.20GHz * 2
- 3.50GB of RAM
- Microsoft Windows Server 2003 Enterprise Edition
- Service Pack 1
- Microsoft C# 2008 Express Edition
- Microsoft .Net Framework v2.0.50727 / v3.0 / v3.5
This was a powerful recently built server. Solutions 2 and 3 were tested in one go. No significant difference in performance was discovered in the different versions of the .NET Framework. The data in Table 3 shows the test result of the .NET Framework 3.0.
- The time format is HH:MM:SS
A maximum CPU usage of 29% was recorded by eye witnessing the Windows Task Manager CPU use meter. Figure 6 shows CPU usage history during the whole test.
Compared to the previous implementation, this result is rather exciting and satisfying. I'd recommend this solution. Roughly a 20-second process time and less than 30% CPU usage is practical.
How to Write Better XPath Queries
Avoiding the double slash "//" is an important factor, since it will recursively search for the whole tree and return matched elements no matter where they are in the document. That's really time consuming.
Concerning the "[]" index, we discovered something interesting. Microsoft IE5 and later implements [0] as the first node, but according to the W3C standard it should be [1].
Writing a better XPath query is a relatively open and big topic. I just did some very primitive studies. The powerful XPath also gives the user enough flexibility to construct various queries. That's what I need to continue working on to find.
Published January 2, 2009 Reads 2,954
Copyright © 2009 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Huang Chang Hao
Huang Chang Hao is a senior software engineer working at Qimonda IT Suzhou Ltd., Co. His main expertise is semiconductor FAB automation software, Equipment Integration and Manufacturing Execution System.
- Kindle 2 vs Nook
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- Confessions of a Ulitzer Addict
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- It's the Java vs. C++ Shootout Revisited!
- Cloud Computing Can Revitalize Your Career as Software Developer
- IBM Could "Reinvent" Java: Mills
- Oracle & Cloud Computing: Exclusive Q&A with SVP Richard Sarwal
- A Brief History of Cloud Computing
- Kindle 2 vs Nook
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- i-Technology Predictions for 2007: Where's It All Headed?









































