Welcome!

PHP Authors: Salvatore Genovese, Michael Sheehan, RealWire News Distribution

Related Topics: XML, SOA & WOA

XML: Article

Index XML Documents with VTD-XML

How to turn the indexing capability on in your application

Results
Absolute Latency
/*/*/*[position() mod 2 = 0]
file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.401 1.521 0.028
po_medium.xml 16.255 25.131 0.449
po_big.xml 159.329 270.188 4.44

/purchaseOrder/items/item[USPrice<100]

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.441 1.612 0.0338
po_medium.xml 16.954 28.21 0.431
po_big.xml 174.201 288.18 4.499

/*/*/*/quantity/text()

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.47 1.534 0.0315
po_medium.xml 17.57 25.278 0.431
po_big.xml 190 272.958 4.412

//item/comment

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.805 1.689 0.0364
po_medium.xml 27.27 27.687 0.434
po_big.xml 398.57 304.103 4.43

//item/comment/../quantity

file name Jaxen (ms) Xalan (ms) VTD-XML (ms)
po_small.xml 0.816 1.706 0.0372
po_medium.xml 28.367 28.338 0.435
po_big.xml 384.05 306.056 4.431

Observation
The benchmark results show that, after removing the parsing cost (by resorting to the index), VTD-XML now consistently outperforms DOM by two orders of magnitude, regardless of the message sizes. Interpreting the above results as the upper limit of how fast an XML content switch makes routing decisions based on the XPath output, VTD-XML's processing throughput, calculated by dividing the XML message size (not including VTD) by the latency, is around 250 MB/sec, roughly doubling the maximum throughput of a gigabit Ethernet connection. This means that switching/routing VTD+XML payloads based on simple XPath expressions is I/O-bound.

Conclusion
This article has introduced the latest indexing feature of VTD-XML along with the latest benchmark numbers showcasing the efficiency level it achieves. Prior to VTD-XML, an XML/SOA application written in DOM or SAX incurs the overhead of XML parsing, XPath evaluation and, optionally, content update. It's not uncommon that those overheads account for 80%-90% or more of the total CPU cycles of running the application. VTD-XML obliterates those overheads since there's not much overhead left to optimize. Using VTD-XML as a parser reduces XML parsing overhead by 5x-10x. Next VTD-XML's incremental update uniquely eliminates the roundtrip overhead of updating XML. Moreover, this article shows VTD-XML's innovative non-blocking, stateless XPath engine significantly outperforming Jaxen and Xalan. With the addition of the indexing capability, XML parsing has now become "optional."

In other words, obstacles standing on the path to successful SOA have quietly disappeared. But this is just another starting point. It probably won't be difficult to see that none of its benefits would exist if VTD-XML stuck with excessive object allocation like DOM. In the context of XML processing, pure OO modeling of an XML infoset (e.g., string and node objects) just doesn't appear the right thing to do in the first place. Like anything else, OO has its weaknesses. The problems (e.g., DOM and SAX's problems) arise when one chooses OO for the sake of choosing it, and stops questioning its sensibility. To me, knowing when not to use objects is equally, if not more, important. Derived from the weaknesses, constraints, and limitations, VTD-XML strives to be the simple, sensible answer to the problems.

And, in the context of SOA, there are more questions on OO programming worth reflecting on. Among them, is OOP's API-based public contract suitable for building loosely coupled, document-centric Web Services applications? The answers, again, are likely to be surprisingly simple.

More Stories By Jimmy Zhang

Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.