Java IoT Authors: Elizabeth White, Liz McMillan, Yeshim Deniz, Zakia Bouachraoui, Pat Romanski

Related Topics: Java IoT

Java IoT: Article

Working With Dynamic XML Documents

Working With Dynamic XML Documents

XML gets mentioned a lot as an interoperability "platform." By itself, of course, XML can't be a platform because it's a document format. It may be flexible, human-readable, dynamic, popular, and cool because it looks a lot like HTML, but it's still just a document format, and there are a lot of differences between a document format and an interoperability platform.

To interoperate using XML, you either have to build an infrastructure around it or incorporate it into an infrastructure that already exists. While other folks build yet another infrastructure around XML, we show in this column how XML has been incorporated into the enterprise-ready, mature CORBA infrastructure. This is CORBA Corner, after all.

The W3C has supplemented XML with the Document Object Model (DOM), defined in OMG IDL. OMG members used the DOM as the basis for their XML mapping, but made one change along the way: instead of keeping the representation of each node in the XML document tree as a full-blown CORBA object, OMG's version represents a node as a CORBA valuetype. Passable by value but not a first-class CORBA object, the valuetype is the CORBA multilanguage equivalent of the Java serializable. And valuetypes are tailor-made to represent an XML document's structure: graphs of valuetypes, sent over the wire by including their root node (or any node, if the node structure links both up and down the tree) in the argument list of a CORBA call, will be reconstructed properly, in their entirety, at the receiving end. Send an XML document to a remote application and suddenly all navigation up and down its tree is done with local invocations instead of dozens of network round-trips.

In this article we investigate OMG's XML/value mapping and the things it lets us do with our XML documents. More than just a bridge between XML and CORBA - although it certainly is all of that - the valuetypes and their structure provide such an elegant API into the XML document (structure and content alike) that (in our opinion, anyhow) this deserves to be the way everyone works with XML content from a program, even in a non-CORBA environment. Here's what you can do with XML documents using the mapping:

  • Create a new XML document from scratch.
  • Read in an existing XML document from storage or from the network.
  • Parse the document into a multiply linked list of CORBA valuetypes. -Parsing can be done dynamically if there's no DTD with structural information about the document. -Parsing can take advantage of a DTD if there is one. -If the document and DTD versions are out of synchronization, the parsing can take advantage of the DTD as far as it goes.
  • Edit the document, including adding or deleting elements; adding, deleting, or changing attributes; and editing text.
  • As a linked list of valuetypes, the document may be sent around the network in CORBA calls with its structure intact. This includes secure, transactional CORBA calls and asynchronous calls using CORBA messaging. This is a great way to send XML data in an invocation.
  • Serialize the in-memory representation, generating a revised version of the Unicode-based XML format document that you're used to.
We don't have space to demonstrate all of these, but we'll look at as many as we can in the form of a programming example. We haven't included the specification details here to save room for example code, which isn't available free off the Web as the specification is. To get the specification, download doc.omg.org/orbos/00-08-10 (the specification document) and doc.omg.org/orbos/00-11-01 (zipped IDL file). Where the two files disagree, the IDL in the zip file supersedes.

Listing 1 is an example XML document that we use throughout this article. If your XML knowledge is a little hazy, point your browser to www.w3.org/XML. And to learn about the DOM, surf to www.w3.org/DOM.

Initializing and Reading the Document
We're not going to list the code that gets us started in XML document processing mode. Instead, we'll just list what we did:

  • Representation of XML documents as strings: Even though both XML and Java use Unicode, CORBA represents XML as a special DOMString type (typdef'd to sequence). Why? Because you can use CORBA to go from any language to any other. Pass a Java string to a C program and you (probably) end up with an array of 8-bit chars; pass it to COBOL (unlikely, we admit, but possible) and the system attempts to translate your Unicode into EBCDIC! Yucko. So we've created a convenience function makeDOMString that converts a Java string into the programming language-independent DOMString type.
  • Reading in the document: We read the document in as a Java string and converted it into a DOMString.
  • Parsing the document: The parser is defined by the specification and supplied with your implementation. After locating the parser (probably via a call to resolve_initial_references), you invoke, for example,

    Document PO_doc = parser.parse(PO_Stream);

  • Error checking: The XML specification requires a parser to return an error, with no partial results, if a document contains even one XML structure/format error. (It doesn't care if you had the price of the bolts wrong, though.) The OMG specification is well prepared for this, with its definition of exception XMLException and 38 specific parsing error codes (numbered 2 through 39, of course). You should definitely check for these errors on return from parse.
On return from parse, if the routine found no errors during parsing, our document is stored in a multiply linked list of valuetypes starting at the root node PO_doc. Now let's do some things with it.

Editing the XML Document
If we're the company writing the PO, we need to edit it - adding or deleting items, changing quantities or POitem numbers or names, or whatever. To our programmer, the XML/value mapping structures the PO data to make it all easily available; using these program structures, the programmer will present the data to our clerk for editing via a GUI. The operation getElementsByTagName returns a list of Elements selected by Tag Name (duh!), so we'd probably start by retrieving all (that is, both) of the POitems this way:

DOMString name = makeDOMString("POitem"); // Retrieve items in Purchase Order: NodeList elms = PO_doc.getElementsByTagName (name);
Now elms, a sequence of Nodes, contains two elements - the two items in our Purchase Order. Each contains four child elements - the POitem_name, POitem_number, POitem_size, and POitem_quantity. We could easily display a POitem in a window for editing, or count the number of POitem nodes that we got back and display the number on the screen, or print it for confirmation when we print the PO.

Changing the Text in an Element
The specification uses OMG IDL attributes, which aren't the same as XML attributes. Here's a quick review in case you forgot how IDL attributes work: if you declare a variable to be an IDL attribute, the IDL compiler generates a get and set operation for it automatically (unless you declare it read- only, which eliminates the set operation). The get and set operations are mapped to programming languages just like all other operations. The Java mapping overloads the operations on the name of the variable: if you include an input argument, it's a set; leave it out and it's a get.

To demonstrate how we can change the text associated with a particular element, let's change the quantity of Bolt POitem_number B01420 to 150 gross. Listing 2 contains the code in a single block, with a few comments. The rest of this section explains it in more detail.

After defining two DOMStrings for use later, we start our loop over poitems in elms. elms.item(i) returns the ith Node in NodeList elms. (The operation name item comes from the XML/Value specification and has nothing to do with the fact that we're retrieving a poitem.) elms.item returns a Node; we have to cast the return value to an Element in order to assign it to element poitem.

Each poitem element has four children, tagnamed (from the strings in our XML document) POitem_name, POitem_number, POitem_size, and POitem_quantity. getElementsByTag- Name returns a list, so we declare ino and iqty to be NodeLists even though we're certain that only one element is going to come back from each call here. After checking that we have a valid poitem (even though we didn't bother to check that we had a valid PO!), we're ready to check and change the number of items we want to buy.

One of these lines of code (at least!) needs a little explanation. It's this one:

if (((Text)(ino.item(0).firstChild())).data().equals(checker))

The four Element valuetype children of poitem that we're working with here don't contain text - they have children that contain the text. Here's how we burrow down to the text itself.

ino is a one-element NodeList containing our POitem_number. item is the operation defined by the specification on NodeList that returns an item in the list by index number. (Once again, the operation name item has nothing to do with its being an item on our PO.) So ino.item(0) returns the first Element in our (one-element!) list.

Fortunately for us, this Element (and its brothers and sisters) has only a single Text Node, so we can retrieve it using the get operation of the readonly attribute Node firstChild defined on the Element. In Java the get operation for an attribute maps to the name of its parameter so the operation firstChild gets that node.

The firstChild is a Text Node, so we have to cast it to (Text) in order to retrieve the text from it.

The text that it contains is in attri- bute data, so we can retrieve it using the get operation for data, which in Java maps to the operation name data. Fortunately it's a DOMString, the same type as checker, so we don't have to do any more casting to do the comparison. Phew!

Naturally, we've strung all of these fetch operations together in a single line of code to show you how elegantly you can program with this specification and Java!

In the next line of code (not counting the comments) we use the set operation of the attribute data of the Text Node of the POitem_quantity Element to set the new quantity. Except for this, the tricks in this line are the same ones as in the line above it.

Adding a New Element
We can add a new element easily. Operations to create new Nodes of all types - that is, Node factories - are defined on our root Document node, so we invoke on PO_doc to create Elements and the Text Nodes. When you create an Element, you specify its tagName; when you create a Text Node, you pass in its text data.

Passing the Document in a CORBA Invocation
To pass our document as a tree of valuetypes, all we have to do is insert the root node as an argument in a CORBA call. For example, suppose our purchasing department runs a server that supports the operation PlaceOrder with this IDL:

Interface PurchasingServer { Document ThisPO; boolean PlaceOrder(in dom::Document order); };

In this operation ThisPO is a Doc- ument valuetype, and is an input argument to the CORBA invocation PlaceOrder. (We're not executing one of the Document methods.) When our client application invokes, in Java, the code in Listing 3, the entire purchase order tree gets sent over the wire to the server where it gets reconstructed exactly as it was in the client application, even though we've only included the root Document node of our purchase order in the argument list of PlaceOrder. This follows from the representation of the document node tree as a multiply linked list.

Writing Out the New or Revised XML Document
Once your user finishes editing the PO, you may want to write it out as an XML data file in Unicode. The operation to do this, serialize, is parallel in form to the parse operation discussed at the beginning of this article. Also, like the parse operation, serialize doesn't exist in the DOM at either Level 1 or Level 2. DOM Level 3 is supposed to introduce this functionality when it arrives.

Flyweight Pattern
It's not much of an issue for short XML documents, but long ones that repeat elements many times (and some documents may have hundreds, thousands, or even more instances of a given element) use up many bytes repeating element name text. The XML/value mapping uses the flyweight pattern to conserve this space: one instance of each element name (and other types of repeated text) is saved in an indexed array, and only the index number is saved with each element. The array is another valuetype, included in the structure of the document, so it goes over the wire along with everything else when you ship your valuetype tree around. What About Documents with DTDs?

The specification treats static documents - that is, documents defined by a DTD - very well indeed, generating not only the IDL for a set of document-specific valuetypes but also their implementation. All you have to do is program the editing operations around these elements tailored to your DTD. We think the static mapping will be used a lot more than the dynamic mapping, but we had to present this first because it's the foundation for the static, which is based on the dynamic valuetypes with DTD-specified names. We'll present the static mapping in an upcoming column, so watch for it.

I'd like to thank Alan Conway and Darach Ennis of IONA Technologies, who wrote the example code for our sample XML file and answered many questions about the specification as we wrote the book chapter from which this article is excerpted.

More Stories By Jon Siegel

Jon Siegel, the Object Management Group’s director of technology transfer, was an early practitioner of distributed computing and OO software development. Jon writes articles and presents tutorials and seminars about CORBA.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
SYS-CON Events announced today that IoT Global Network has been named “Media Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 6–8, 2017, at the Javits Center in New York City, NY. The IoT Global Network is a platform where you can connect with industry experts and network across the IoT community to build the successful IoT business of the future.
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...