Java IoT Authors: Elizabeth White, Liz McMillan, Zakia Bouachraoui, Yeshim Deniz, Pat Romanski

Related Topics: Java IoT

Java IoT: Article

Working With Dynamic XML Documents

Working With Dynamic XML Documents

XML gets mentioned a lot as an interoperability "platform." By itself, of course, XML can't be a platform because it's a document format. It may be flexible, human-readable, dynamic, popular, and cool because it looks a lot like HTML, but it's still just a document format, and there are a lot of differences between a document format and an interoperability platform.

To interoperate using XML, you either have to build an infrastructure around it or incorporate it into an infrastructure that already exists. While other folks build yet another infrastructure around XML, we show in this column how XML has been incorporated into the enterprise-ready, mature CORBA infrastructure. This is CORBA Corner, after all.

The W3C has supplemented XML with the Document Object Model (DOM), defined in OMG IDL. OMG members used the DOM as the basis for their XML mapping, but made one change along the way: instead of keeping the representation of each node in the XML document tree as a full-blown CORBA object, OMG's version represents a node as a CORBA valuetype. Passable by value but not a first-class CORBA object, the valuetype is the CORBA multilanguage equivalent of the Java serializable. And valuetypes are tailor-made to represent an XML document's structure: graphs of valuetypes, sent over the wire by including their root node (or any node, if the node structure links both up and down the tree) in the argument list of a CORBA call, will be reconstructed properly, in their entirety, at the receiving end. Send an XML document to a remote application and suddenly all navigation up and down its tree is done with local invocations instead of dozens of network round-trips.

In this article we investigate OMG's XML/value mapping and the things it lets us do with our XML documents. More than just a bridge between XML and CORBA - although it certainly is all of that - the valuetypes and their structure provide such an elegant API into the XML document (structure and content alike) that (in our opinion, anyhow) this deserves to be the way everyone works with XML content from a program, even in a non-CORBA environment. Here's what you can do with XML documents using the mapping:

  • Create a new XML document from scratch.
  • Read in an existing XML document from storage or from the network.
  • Parse the document into a multiply linked list of CORBA valuetypes. -Parsing can be done dynamically if there's no DTD with structural information about the document. -Parsing can take advantage of a DTD if there is one. -If the document and DTD versions are out of synchronization, the parsing can take advantage of the DTD as far as it goes.
  • Edit the document, including adding or deleting elements; adding, deleting, or changing attributes; and editing text.
  • As a linked list of valuetypes, the document may be sent around the network in CORBA calls with its structure intact. This includes secure, transactional CORBA calls and asynchronous calls using CORBA messaging. This is a great way to send XML data in an invocation.
  • Serialize the in-memory representation, generating a revised version of the Unicode-based XML format document that you're used to.
We don't have space to demonstrate all of these, but we'll look at as many as we can in the form of a programming example. We haven't included the specification details here to save room for example code, which isn't available free off the Web as the specification is. To get the specification, download doc.omg.org/orbos/00-08-10 (the specification document) and doc.omg.org/orbos/00-11-01 (zipped IDL file). Where the two files disagree, the IDL in the zip file supersedes.

Listing 1 is an example XML document that we use throughout this article. If your XML knowledge is a little hazy, point your browser to www.w3.org/XML. And to learn about the DOM, surf to www.w3.org/DOM.

Initializing and Reading the Document
We're not going to list the code that gets us started in XML document processing mode. Instead, we'll just list what we did:

  • Representation of XML documents as strings: Even though both XML and Java use Unicode, CORBA represents XML as a special DOMString type (typdef'd to sequence). Why? Because you can use CORBA to go from any language to any other. Pass a Java string to a C program and you (probably) end up with an array of 8-bit chars; pass it to COBOL (unlikely, we admit, but possible) and the system attempts to translate your Unicode into EBCDIC! Yucko. So we've created a convenience function makeDOMString that converts a Java string into the programming language-independent DOMString type.
  • Reading in the document: We read the document in as a Java string and converted it into a DOMString.
  • Parsing the document: The parser is defined by the specification and supplied with your implementation. After locating the parser (probably via a call to resolve_initial_references), you invoke, for example,

    Document PO_doc = parser.parse(PO_Stream);

  • Error checking: The XML specification requires a parser to return an error, with no partial results, if a document contains even one XML structure/format error. (It doesn't care if you had the price of the bolts wrong, though.) The OMG specification is well prepared for this, with its definition of exception XMLException and 38 specific parsing error codes (numbered 2 through 39, of course). You should definitely check for these errors on return from parse.
On return from parse, if the routine found no errors during parsing, our document is stored in a multiply linked list of valuetypes starting at the root node PO_doc. Now let's do some things with it.

Editing the XML Document
If we're the company writing the PO, we need to edit it - adding or deleting items, changing quantities or POitem numbers or names, or whatever. To our programmer, the XML/value mapping structures the PO data to make it all easily available; using these program structures, the programmer will present the data to our clerk for editing via a GUI. The operation getElementsByTagName returns a list of Elements selected by Tag Name (duh!), so we'd probably start by retrieving all (that is, both) of the POitems this way:

DOMString name = makeDOMString("POitem"); // Retrieve items in Purchase Order: NodeList elms = PO_doc.getElementsByTagName (name);
Now elms, a sequence of Nodes, contains two elements - the two items in our Purchase Order. Each contains four child elements - the POitem_name, POitem_number, POitem_size, and POitem_quantity. We could easily display a POitem in a window for editing, or count the number of POitem nodes that we got back and display the number on the screen, or print it for confirmation when we print the PO.

Changing the Text in an Element
The specification uses OMG IDL attributes, which aren't the same as XML attributes. Here's a quick review in case you forgot how IDL attributes work: if you declare a variable to be an IDL attribute, the IDL compiler generates a get and set operation for it automatically (unless you declare it read- only, which eliminates the set operation). The get and set operations are mapped to programming languages just like all other operations. The Java mapping overloads the operations on the name of the variable: if you include an input argument, it's a set; leave it out and it's a get.

To demonstrate how we can change the text associated with a particular element, let's change the quantity of Bolt POitem_number B01420 to 150 gross. Listing 2 contains the code in a single block, with a few comments. The rest of this section explains it in more detail.

After defining two DOMStrings for use later, we start our loop over poitems in elms. elms.item(i) returns the ith Node in NodeList elms. (The operation name item comes from the XML/Value specification and has nothing to do with the fact that we're retrieving a poitem.) elms.item returns a Node; we have to cast the return value to an Element in order to assign it to element poitem.

Each poitem element has four children, tagnamed (from the strings in our XML document) POitem_name, POitem_number, POitem_size, and POitem_quantity. getElementsByTag- Name returns a list, so we declare ino and iqty to be NodeLists even though we're certain that only one element is going to come back from each call here. After checking that we have a valid poitem (even though we didn't bother to check that we had a valid PO!), we're ready to check and change the number of items we want to buy.

One of these lines of code (at least!) needs a little explanation. It's this one:

if (((Text)(ino.item(0).firstChild())).data().equals(checker))

The four Element valuetype children of poitem that we're working with here don't contain text - they have children that contain the text. Here's how we burrow down to the text itself.

ino is a one-element NodeList containing our POitem_number. item is the operation defined by the specification on NodeList that returns an item in the list by index number. (Once again, the operation name item has nothing to do with its being an item on our PO.) So ino.item(0) returns the first Element in our (one-element!) list.

Fortunately for us, this Element (and its brothers and sisters) has only a single Text Node, so we can retrieve it using the get operation of the readonly attribute Node firstChild defined on the Element. In Java the get operation for an attribute maps to the name of its parameter so the operation firstChild gets that node.

The firstChild is a Text Node, so we have to cast it to (Text) in order to retrieve the text from it.

The text that it contains is in attri- bute data, so we can retrieve it using the get operation for data, which in Java maps to the operation name data. Fortunately it's a DOMString, the same type as checker, so we don't have to do any more casting to do the comparison. Phew!

Naturally, we've strung all of these fetch operations together in a single line of code to show you how elegantly you can program with this specification and Java!

In the next line of code (not counting the comments) we use the set operation of the attribute data of the Text Node of the POitem_quantity Element to set the new quantity. Except for this, the tricks in this line are the same ones as in the line above it.

Adding a New Element
We can add a new element easily. Operations to create new Nodes of all types - that is, Node factories - are defined on our root Document node, so we invoke on PO_doc to create Elements and the Text Nodes. When you create an Element, you specify its tagName; when you create a Text Node, you pass in its text data.

Passing the Document in a CORBA Invocation
To pass our document as a tree of valuetypes, all we have to do is insert the root node as an argument in a CORBA call. For example, suppose our purchasing department runs a server that supports the operation PlaceOrder with this IDL:

Interface PurchasingServer { Document ThisPO; boolean PlaceOrder(in dom::Document order); };

In this operation ThisPO is a Doc- ument valuetype, and is an input argument to the CORBA invocation PlaceOrder. (We're not executing one of the Document methods.) When our client application invokes, in Java, the code in Listing 3, the entire purchase order tree gets sent over the wire to the server where it gets reconstructed exactly as it was in the client application, even though we've only included the root Document node of our purchase order in the argument list of PlaceOrder. This follows from the representation of the document node tree as a multiply linked list.

Writing Out the New or Revised XML Document
Once your user finishes editing the PO, you may want to write it out as an XML data file in Unicode. The operation to do this, serialize, is parallel in form to the parse operation discussed at the beginning of this article. Also, like the parse operation, serialize doesn't exist in the DOM at either Level 1 or Level 2. DOM Level 3 is supposed to introduce this functionality when it arrives.

Flyweight Pattern
It's not much of an issue for short XML documents, but long ones that repeat elements many times (and some documents may have hundreds, thousands, or even more instances of a given element) use up many bytes repeating element name text. The XML/value mapping uses the flyweight pattern to conserve this space: one instance of each element name (and other types of repeated text) is saved in an indexed array, and only the index number is saved with each element. The array is another valuetype, included in the structure of the document, so it goes over the wire along with everything else when you ship your valuetype tree around. What About Documents with DTDs?

The specification treats static documents - that is, documents defined by a DTD - very well indeed, generating not only the IDL for a set of document-specific valuetypes but also their implementation. All you have to do is program the editing operations around these elements tailored to your DTD. We think the static mapping will be used a lot more than the dynamic mapping, but we had to present this first because it's the foundation for the static, which is based on the dynamic valuetypes with DTD-specified names. We'll present the static mapping in an upcoming column, so watch for it.

I'd like to thank Alan Conway and Darach Ennis of IONA Technologies, who wrote the example code for our sample XML file and answered many questions about the specification as we wrote the book chapter from which this article is excerpted.

More Stories By Jon Siegel

Jon Siegel, the Object Management Group’s director of technology transfer, was an early practitioner of distributed computing and OO software development. Jon writes articles and presents tutorials and seminars about CORBA.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

IoT & Smart Cities Stories
Early Bird Registration Discount Expires on August 31, 2018 Conference Registration Link ▸ HERE. Pick from all 200 sessions in all 10 tracks, plus 22 Keynotes & General Sessions! Lunch is served two days. EXPIRES AUGUST 31, 2018. Ticket prices: ($1,295-Aug 31) ($1,495-Oct 31) ($1,995-Nov 12) ($2,500-Walk-in)
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Nicolas Fierro is CEO of MIMIR Blockchain Solutions. He is a programmer, technologist, and operations dev who has worked with Ethereum and blockchain since 2014. His knowledge in blockchain dates to when he performed dev ops services to the Ethereum Foundation as one the privileged few developers to work with the original core team in Switzerland.
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...