| By Jimmy Zhang | Article Rating: |
|
| February 20, 2008 02:15 PM EST | Reads: |
31,630 |
• Easy To Use - Usually adding a couple of lines (loadIndex(...) and writeIndex(...) as seen in the previous example) to your existing VTD-XML code is all that's needed to enable VTD+XML in your applications.
• Compact - The size of VTD+XML is usually about 30%-50% bigger than the size of the corresponding XML document. This is again consistent with the memory use of the VTD-XML processing model.
• Platform Neutral - Just like XML, VTD+XML is designed to be platform-neutral in that it explicitly includes information about the byte endian-ness of the platform on which the index is generated. Users of the C or C# version VTD-XML code can automatically recognize and make use of the index generated by the Java version.
At the same time, users of VTD+XML need to be aware of the following limitations:
• Upper Limit on Document Size
- The maximum XML document size supported by VTD+XML is 2GB without
name space support. With name space, VTD+XML supports a maximum of 1GB.
• Lack of Support for External Entities - VTD-XML currently supports five built-in entity references (<, >, &, ', and ") as defined in XML 1.0.
The Case Involving XML Content Update
Some of you
may wonder: What if the subsequent XML operations involve content
updates that shift the offset value? In general, those use cases often
require the updated XML document to be re-indexed. And for large XML
documents, you may argue that the cost of re-indexing can be quite
significant. However, there are actually several workarounds, all aimed
at reducing, even eliminating, the cost of re-indexing.
The first workaround: Instead of creating the VTD+XML index for a single big XML document, split the XML document into multiple smaller ones, each of which is then indexed using VTD+XML. From this point on, you only need to regenerate a VTD+XML index for those "updated" XML fragments that are usually a lot smaller and therefore cheaper to re-index.
VTD-XML 2.0 also introduced the "overwrite" feature that lets you modify XML content without needing to regenerate the index. The code below makes use of the VTDNav class's new "overWrite(...)" to change the text node of "<root>good</root>" from "good" or "bad." If the new content is shorter or equal in length to that of the old content, the method "overWrite(...)" fills up the non-overlapping portion of the text with white spaces and returns true. Otherwise, no change to the original content and "overWrite(...)" returns false.
import com.ximpleware.*;
class Overwrite{
public static void main(String s[]) throws Exception{
VTDGen vg = new VTDGen();
vg.setDoc("<root>good</root>".getBytes());
vg.parse(true);
VTDNav vn = vg.getNav();
int i=vn.getText();
//print "good"
System.out.println("text ---> "+vn.toString(i));
if (vn.overWrite(i,"bad".getBytes())){
//overwrite, if successful, returns true
//print "bad" here
System.out.println("text ---> "+vn.toString(i));
}
}
}
The "overWrite" feature may look simple, but it actually has unexpected implications for the performance of XML. Consider the database table design in which you specify the column width. You can now borrow the same technique for XML composition: By pre-serializing some extra spaces into text nodes or attribute values, you can make "in situ" updates to those nodes and do so without regenerating the index. You can even pre-serialize, in an XML document, dummy elements containing text nodes or attribute values whose initial values are entirely white spaces. Those dummy elements serve as templates in anticipation of a future content update, as shown in the example below.
The template
<purchaseOrder orderDate=" ">
<item partNum=" " >
<productName> </productName>
<quantity> </quantity>
<USPrice> </USPrice>
</item>
</purchaseOrder>
After "stamping" in the data
<purchaseOrder orderDate="1999-10-21">
<item partNum="872-AA" >
<productName>Lawnmower </productName>
<quantity>1 </quantity>
<USPrice> 100 </USPrice>
</item>
</purchaseOrder>
And, by the same token, the concept of XML content deletion deserves a bit of rethinking as well. Instead of physically deleting an XML element, you can disable the XML elements by making them "invisible" to your applications to achieve the same goal. The benefit: you again avoid the need to re-index. Notice that this plays favorably to XML's strength as a loose encoding data format. Below is an example of setting the value of the attribute "enable" of an element to make it "invisible."
Before
<purchaseOrder orderDate="1999-10-21">
<item partNum="872-AA" enable="1">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
</item>
</purchaseOrder>
After
<purchaseOrder orderDate="1999-10-21">
<item partNum="872-AA" enable='0'>
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
</item>
</purchaseOrder>
Published February 20, 2008 Reads 31,630
Copyright © 2008 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Jimmy Zhang
Jimmy Zhang is a cofounder of XimpleWare, a provider of high performance XML processing solutions. He has working experience in the fields of electronic design automation and Voice over IP for a number of Silicon Valley high-tech companies. He holds both a BS and MS from the department of EECS from U.C. Berkeley.
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- Adobe Flex Developer Earns $100K in New York City
- Move Over BI, Here Comes PI - Performance Intelligence
- Yahoo! Query Language
- RothmanResearch.com Market Review and Company Analysis on FULT, DISH, ASBC, CTRP, FSLR and CRBC
- Qt DevDays 2009 - Munich
- The Time Is Right for Enterprise Cloud Computing
- Microsoft Nudges Eclipse Developers to Windows-Ware
- Who Invented Virtualization?
- ExaGrid Sets New Standard in Backup Price, Performance and Capacity with Launch of EX10000E Disk Backup System with Data Deduplication and Expanded 100TB GRID Capacity
- Smearing Cloud Lipstick on a Legacy Tech Pig
- Moving the Operating System & Desktop to the Cloud
- 1st Annual Government IT Conference & Expo: Themes & Topics
- Cloud Computing on Gartner's Top 10 List and SYS-CON Events' 2010 Calendar
- Is Microsoft as Free as Open Source?
- Adobe Flex Developer Earns $100K in New York City
- The Curious Case of Build Release Management eBook
- IBM, Microsoft, Others in Lock-Picking Cloud API Push
- Move Over BI, Here Comes PI - Performance Intelligence
- United Planet offers practical portal building tips for SMBs
- Yahoo! Query Language
- RothmanResearch.com Market Review and Company Analysis on FULT, DISH, ASBC, CTRP, FSLR and CRBC
- Qt DevDays 2009 - Munich
- The Time Is Right for Enterprise Cloud Computing
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- The Top 250 Players in the Cloud Computing Ecosystem
- Red Hat Named "Platinum Sponsor" of Virtualization Conference & Expo
- Ulitzer to Give Drupal 6.0 Its Biggest Scalability Challenge Yet
- An Introduction to Ant
- Appcelerator Named "Platinum Sponsor" of AJAX World Conference & Expo
- "What's New and Exciting About the Web Right Now?" Asks Time Magazine
- Oracle To Keynote Cloud Computing Expo
- First Eclipse Project Targeting PHP Now Available
- AJAX World - Two Great PDF Creators
- C#, Turbo Pascal, C++, PHP...and the LEGO Brick: Denmark's Leading Exports
- Rolling Your Own MVC: The Page Load Scenario





























