Welcome!

Java Authors: Liz McMillan, Walter H. Pinson, III, Maureen O'Gara, Yakov Werde, Tony Bishop

Related Topics: .NET

.NET: Article

A Study of XPath Performance in .NET Programming

Testing four different solutions

Figure 2 shows the CPU usage of the current solution. The time span is 1 hour 10 minutes and 18 seconds.

Figure 3 shows the CPU usage of our proposed new solution. The time span is 2 minutes and 56 seconds.

XmlDocument versus XPathDocument
The major differences between XmlDocument and XPathDocument are:

  1. XmlDocument is editable, while XPathDocument is read-only
  2. They are based on different data models.

XmlDocument is based on the W3C XML DOM, which is an object model that basically covers all XML syntaxes, including low-level syntax sugar such as entities, CDATA sections, DTD, notations, etc. That's a document-centric model and it allows for full fidelity when loading/saving XML documents.

XPathDocument is based on an XPath 1.0 data model that is a read-only XML Infoset-compatible data-centric object model that covers only semantically significant parts of XML, leaving out insignificant syntax details - no DTD, no entities, no CDATA, no adjacent text nodes, only significant data expressed as a tree with seven types of nodes. Simple and lightweight. That's why XPathDocument is a preferred data store for read-only scenarios, especially with XPath or XSLT involved.

Converting XmlDocument into XPathDocument
There are a wide variety of ways to convert XmlDocument to XPathDocument. I tried two of them. These two methods are very straightforward. Their general algorithms are:

Method 1:

  • Save XmlDocument into a temporary disk file
  • Create XPathDocument with the XML file above
  • Delete that temporary file

Method 2:

  • Save XmlDocument into a MemoryStream
  • Create XPathDocument with the above MemoryStream

For implementation details refer to the source code in the Appendix.

I tested these two methods with a dummy XML file that was 11,267,545 bytes. This is a huge XML file. It's not likely to happen in real production. We got the result shown in Table 2:

Memory consumption for both methods were almost equal.

In Figures 4 and 5, the red line shows the CPU and memory use of Method 1, and the blue line shows the CPU and Memory use of Method 2.

Conclusion and Recommendations
As you can see, both Solution 2 and 3, which implement XPathDocument, are about 30 times faster than the current solution that implements XmlDocument to get the same result. So XPathDocument is recommended.

If an update/modification of XML is required, XmlDocument can be used first before the program reaches the XPath query part. XmlDocument can be converted to XPathDocument then the program proceeds to do an XPath query.

Below is a piece of sample code in C# showing how to convert XMLDocument to XPathDocument.

using System.Xml.XPath;
using System.IO;

// variables definition
XPathDocument xpathDoc = null;
XmlDocument xmlDoc = null;

// xml file and xpath query
const string FILE = "<xml file path goes here>";

// load xml file, initialize XmlDocument
xmlDoc = new XmlDocument( FILE );

// save XmlDocument into a memory stream
MemoryStream memStream = new MemoryStream();
xmlDoc.Save( memStream );
memStream.Position = 0;

// create XPathDocument with memory stream
xpathDoc = new XPathDocument( memStream );

Solution 2 is our proposed solution. Compared to Solution 3, the Evaluate method supports more features, such as the XPath function call, while Select doesn't. See the Appendix for the implementation details of Solution 2.

Below is a piece of sample code in C# showing how to query XML by XPathDocument.

using System.Xml.XPath;

// variables definition
XPathDocument xpathDoc = null;
XPathNavigator nav = null;
XPathExpression expression = null;
XPathNodeIterator iterator = null;

// xml file and xpath query
const string FILE = "<xml file path goes here>";
const string XPATH = "<xpath string goes here>";
xpathDoc = new XPathDocument( FILE );
nav = xpathDoc.CreateNavigator();
expression = nav.Compile( XPATH );
iterator = (XPathNodeIterator) nav.Evaluate( expression );
while( iterator.MoveNext() )
{
// gets value here
}

Proof of Our Recommendation
To prove that our recommended solution is practical, I repeated this test in a different hardware environment, more like a powerful server in a real production environment.

The hardware and software configuration was:

  • Intel Xeon CPU 3.20GHz * 2
  • 3.50GB of RAM
  • Microsoft Windows Server 2003 Enterprise Edition
  • Service Pack 1
  • Microsoft C# 2008 Express Edition
  • Microsoft .Net Framework v2.0.50727 / v3.0 / v3.5

This was a powerful recently built server. Solutions 2 and 3 were tested in one go. No significant difference in performance was discovered in the different versions of the .NET Framework. The data in Table 3 shows the test result of the .NET Framework 3.0.

  1. The time format is HH:MM:SS

A maximum CPU usage of 29% was recorded by eye witnessing the Windows Task Manager CPU use meter. Figure 6 shows CPU usage history during the whole test.

Compared to the previous implementation, this result is rather exciting and satisfying. I'd recommend this solution. Roughly a 20-second process time and less than 30% CPU usage is practical.

How to Write Better XPath Queries
Avoiding the double slash "//" is an important factor, since it will recursively search for the whole tree and return matched elements no matter where they are in the document. That's really time consuming.

Concerning the "[]" index, we discovered something interesting. Microsoft IE5 and later implements [0] as the first node, but according to the W3C standard it should be [1].

Writing a better XPath query is a relatively open and big topic. I just did some very primitive studies. The powerful XPath also gives the user enough flexibility to construct various queries. That's what I need to continue working on to find.

More Stories By Huang Chang Hao

Huang Chang Hao is a senior software engineer working at Qimonda IT Suzhou Ltd., Co. His main expertise is semiconductor FAB automation software, Equipment Integration and Manufacturing Execution System.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.