Welcome!

Java Authors: Jerry Melnick, Liz McMillan, Esmeralda Swartz, Michelle Drolet, Kevin Benedict

Related Topics: Java

Java: Article

XML Serialization of Java Objects

XML Serialization of Java Objects

Java serialization was initially used to support remote method invocation (RMI), allowing argument objects to be passed between two virtual machines.

RMI works best when the two VMs contain compatible versions of the class being transmitted, and can reliably transmit a binary representation of the object based on its internal state. When an object is serialized, it must also serialize the objects to which its fields refer - resulting in what is commonly called an object graph of connected components. Although the transient keyword can be used to control the extent to which the serialization process penetrates the object graph, this level of control is seldom enough.

Many have tried to use Java's serialization to achieve the so-called "long-term persistence" of data - where the serialized form of a Java data structure is written to a file for later use. One such area is the development tools domain, in which designs must be saved for later use. Because the logic that saves and restores serialized objects is based on the internal structure of the constituent classes, any changes to those classes between the time that the object was saved and when it was retrieved may cause the deserialization process to fail outright; for example, a field was added or removed, existing fields were renamed or reordered, or the class's superclass or package was altered. Such changes are to be expected during the development process, and any mechanism that relies on the internal structure of all classes being identical between versions to work has the odds stacked against it. Over the last few years the "versioning issues" associated with Java's serialization mechanism have indeed proved to be insurmountable and have led to widespread abandonment of Java's serialization as a viable long-term persistence strategy in the development tools space.

To tackle Java serialization problems, a Java Specification Request (JSR 57) was created, titled "Long-Term Persistence for JavaBeans." JSR 57 is included in JRE 1.4 and is part of the "java.beans" package. This article describes the mechanism with which the JSR solved the problems of long-term persistence, and how you can take control of the way that the XMLEncoder generates archives to represent the data in your application.

We'll start our section by dispelling two popular myths that have grown up around XML serialization: that it can only be used for JavaBeans and that all JavaBeans are GUI widgets. In fact, the XMLEncoder can support any public Java class; these classes don't have to be JavaBeans and they certainly don't have to be GUI widgets. The only constraint that the encoder places on the classes it can archive is that there must be a means to create and configure each instance through public method calls. If the class implements the getter/setter paradigm of the JavaBeans specification, the encoder can acheive its goal automatically - even for a class it knows nothing about. On top of this default behavior, the XMLEncoder comes with a small but very powerful API that allows it to be "taught" how to save instances of any class - even if they don't use any of the JavaBeans design patterns. In fact, most of the Swing classes deviate from the JavaBeans specification in some way and yet the XMLEncoder handles them via a set of rules with which it comes preconfigured. The XMLEcoder is currently spec'ed to provide automatic support for all subclasses of Component in the SDK and all of their property types (recursively). This means that as well as being able to serialize all of AWT and Swing GUI widgets, the XMLEncoder can also serialize: primitive values (int, double, etc.), strings, dates, arrays, lists, hashtables (including all Collection classes), and many other classes that you might not think of as having anything to do with JavaBeans. The support for all these classes is not "hard-wired" into the XMLEncoder; instead it is provided to the Encoder through the API that it exposes for general use. The variety in the APIs among even the small subset of classes mentioned earlier should give some idea of the generality and scope of the persistence techniques we will cover in the next sections.

Background
When problems are encountered with an object stream, they're hard to correct because the format is binary. An XML document is human readable, and therefore easier for a user to examine and manipulate when problems arise. To serialize objects to an XML document, use the class java.beans.XMLEncoder; to read objects, use the class java.beans.XMLDecoder.

One reason object streams are brittle is that they rely on the internal shape of the class remaining unchanged between encoding and decoding. The XMLEncoder takes a completely different approach here: instead of storing a bit-wise representation of the field values that make up an object's state, the XMLEncoder stores the steps necessary to create the object through its public API. There are two key factors that make XML files written this way remarkably robust when compared with their serialized counterparts.

First, many changes to a class's internal implementation can be made while preserving backward compatibility in its public APIs. In public libraries, this is often a requirement of new releases - as breaking a committed public API would break all the third-party code that had used the library in its older form. As a result of this, many software vendors have internal policies that prevent its developers from knowingly "breaking" any of the public APIs in new releases. While exceptions inevitably arise, they are on a much, much smaller scale than the internal changes that are made to the private implementations of the classes within the library. In this way, the XMLDecoder derives much of its resilience to versioning by aligning its requirements with those of developers who program against APIs directly.

The second reason for the stability of the decoding process as implemented by the XMLDecoder is just as important. If you were to take an instance of any class, choose an arbitrary member variable, and set it to null - the behavior of that instance would be completely undefined in all subsequent operations - and a bug-free implementation would be entitled to fail catastrophically under these circumstances. This is exactly what happens when a field is added to a new version of a class and this causes people to cross their fingers when trying to deserialize an instance of a class that was written out with an older version. The XMLEncoder, by contrast, doesn't store a list of private fields but a program that represents the object's state. Here's an XML file representing a window with the title "Test":

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.1" class="java.beans.XMLDecoder">
<object class="javax.swing.JFrame">
<void property="title">
<string>Test</string>
</void>
<void property="visible">
<boolean>true<boolean/>
</void>
</object>
</java>

XML archives, written by XMLEncoder, have exactly the same information as a Java program - they're just written using an XML encoding rather than a Java one. Here's what the above program would look like in Java:

JFrame f = new JFrame();
f.setTitle("Test");
f.setVisible(true);

When a backward compatibility issue arises in one of the classes in the archive, it may cause one of the earlier statements to fail. A new version of the class might, for example, choose not to define the "setTitle()" method. When this happens, the XMLDecoder detects that this method is now missing from the class and doesn't try to call it. Instead, it issues a warning, ignores the offending statement, and continues with the other statements in the file. The critical point is that not calling the "setTitle()" method does not violate the contract of the implementation (as deleting an instance variable would), and the resulting instance should be a valid and fully functional Java object. If the resulting Java object fails in any way, an ordinary Java program could be written against its API to demonstrate a genuine bug in its implementation.

The vendors of popular Java libraries tend to devote significant resources toward programs to manage demonstrable bugs of this kind and enlist the support of the development community to work toward their eradication - Sun's "BugParade" is a well-known example. As a result of these kinds of programs, bugs that can be demonstrated by simple "setup code" tend to be rare in mature libraries. Once again, the XMLDecoder benefits here as it's able to ride on the coattails of the Java developer by using the public APIs of the classes instead of relying on special privileges to circumvent them.

Encoding of JavaBeans
To illustrate the XMLEncoder, this article shows serialization based on a number of scenarios using an example Person class. These range from simple JavaBeans encoding through nondefault construction and custom initialization.

In the simplest scenario, the class Person has String fields for firstName and lastName, together with get and set methods.

public class Person {
private String firstName;
private String lastName;
public String getFirstName() { return firstName; }
public String getLastName() { return lastName; }
public void setFirstName(String str) { firstName = str; }
public void setLastName(String str) { lastName = str; }
}

The following code creates an encoder and serializes a Person.

FileOutputStream os = new FileOutputStream("C:/cust.xml");
XMLEncoder encoder = new XMLEncoder(os);
Person p = new Person();
p.setFirstName("John");
encoder.writeObject(p);
encoder.close();

The XML file created shows that Person class has been encoded, and that its firstName property is the string "John".

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.1" class="java.beans.XMLDecoder">
<object class="Person">
<void property="firstName">
<string>John</string>
</void>
</object>
</java>

When the file is decoded with the XMLDecoder, the Person class will be instantiated with its default constructor, and the firstName property set by calling the method setFirstName("John").

FileInputStream os = new FileInputStream("C:/cust.xml");
XMLDecoder decoder = new XMLDecoder(os);
Person p = (Person)decoder.readObject();
decoder.close();

To understand how to leverage the encoder and decoder for custom serialization requires an understanding of the JavaBeans component model. This describes a class's interface in terms of a set of properties, each of which can have a get and set method. To determine the set of operations required to re-create an object, the XMLEncoder creates a prototype instance using its default constructor and then compares the value of each property between this and the object being serialized. If any of the values don't match, the encoder adds it to the graph of objects to be serialized, and so on until it has a complete set of the objects and properties required to re-create the original object being serialized. When the encoder reaches objects that can't be broken down any further, such as Java's strings, ints, or doubles, it writes these values directly to the XML document as tag values. For a complete list of these primitive values and their associated tags, see http://java.sun.com/products/jfc/tsc/ articles/persistence3/index.html.

To serialize an object, XMLEncoder uses the Strategy pattern, and delegates the logic to an instance of java.beans.PersistenceDelegate. The persistence delegate is given the object being serialized and is responsible for determining which API methods can be used to re-create the same instance in the VM in which it will be decoded. The XMLEncoder then executes the API to create the prototype instance that it gives to the delegate, together with the original object being serialized, so the delegate can determine the API methods to re-create the nondefault state.

The method XMLEncoder.setPersistenceDelegate(Class objectClass, PersistenceDelegate delegate) is used to set a customized delegate for an object class. To illustrate this we'll change the original Person class so that it no longer conforms to the standard JavaBeans model, and show how persistence delegates can be used to teach the XMLEncoder to successfully serialize each instance.

Constructor Arguments
One of the patterns that can be taught to the XMLEncoder is how to create an instance where there is no zero-argument constructor. The following is an example of this in which a Person must be constructed with its firstName and lastName as arguments.

public Person(String aFirstName, String aLastName){
firstName = aFirstName;
firstName = aLastName;
}

In the absence of any customized delegate, the XMLEncoder uses the class java.beans.DefaultPersistenceDelegate. This expects the instance to conform to the JavaBeans component model with a zero-argument constructor and JavaBeans properties controlling its state. For the Person whose property values are supplied as constructor arguments, an instance of DefaultPersistenceDelegate can be created with the list of property names that represent the constructor arguments.

XMLEncoder e = new XMLEncoder(os);
Person p = new Person("John","Smith");
e.setPersistenceDelegate(Person.class,
new DefaultPersistenceDelegate(
new String[] { "firstName","lastName"}
);
e.writeObject(person);

When the XMLEncoder creates the XML for the Person object, it uses the supplied instance of the DefaultPersistenceDelegate, queries the values of the firstName and lastProperties, and creates the following XML document.

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.1" class="java.beans.XMLDecoder">
<object class="Person">
<string>John</string>
<string>Smith</string>
</object>
</java>

The result is a record of the Object's state but written in such a way that the XMLDecoder can locate and call the public constructor of the Person object just as a Java program would. In the previous XML document where the Person was a standard JavaBeans component, the nondefault properties were specified with named <void property="propertyName"> tags that contained the argument values.

Although custom encoding rules can be supplied to the XMLEncoder, this is not true of the XMLDecoder. The XML document represents the API steps to re-create the serialized objects in a target VM. One advantage of not having custom decoder rules is that only the environment that serializes the objects requires customization, whereas the target environment just requires the classes with unchanged APIs. This makes it ideal for the following scenario - serialization of an object graph within a development tool that has access to design-time customization, where the XML document will be read in a runtime environment that does not have access to the persistence delegates used during encoding.

Custom Instantiation
In addition to a class being constructed with property values as arguments, custom instantiation can include use of factory methods. An example of this would be if Person's constructor were package protected and instances of the Person class could only be created by calling a static createPerson() method defined in a PersonFactory class.

To write a persistence delegate requires a basic understanding of how the encoder creates its set of operations that will re-create the serialized objects when the stream is deserialized. The XMLEncoder uses the command pattern to record each of the required method calls as instances of the class java.beans.Statement. Each Statement represents an API call in which a method is sent to a target, together with any arguments. Commands that are responsible for the instantiation of objects are instances of java.beans.Expression. A subclass of Statement returns a value. Each object in the graph is represented by the Expression that creates it and a set of Statements that are used to initialize it.

For general control of instantiation, a subclass of the PersistenceDelegate class should be created with a specialized instantiate() method. The return value is the java.beans.Expression that indicates to the encoder which method or constructor should be used to create (or retrieve) the object. The returned Expression includes the object, the target (normally the class that defines the constructor), the method name (normally the fake name "new," which indicates a constructor call), and the argument values that the method or constructor takes.

The first argument of the instantiate() method is the instance of the Person object being serialized, and the second object is the encoder (see Listing 1).

When the XMLEncoder serializes the Person instance, instead of the DefaultPersistenceDelegate that uses standard JavaBeans rules for properties, it uses the anonymous inner class we registered as the persistence delegate of the Person.class. The resulting XML follows. In the <object> tag as well as the class name, the static method createPerson has also been included, and the arguments are specified as child tags.

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.4.1" class="java.beans.XMLDecoder">
<object class="PersonFactory" method="createPerson">
<string>Smith</string>
<void property="firstName">
<string>John</string>
</void>
</object>
</java>

The inner class created for the Person persistence delegate subclasses from DefaultPersistenceDelegate, so the firstName property value of "John" is included in the XML document; however, no property tag is included for lastName. This is because the XMLEncoder compares the prototype instance of Person against the instance being serialized to determine which property values are not their default and need to be included in the XML document. The method that does this is protected void initialize(Class type, Object oldInstance, Object newInstance, Encoder out). The oldInstance argument is the object being serialized and the newInstance is the prototype. Because the prototype instance is created using the Expression returned by the persistence delegate's method protected Expression instantiate(Object oldInstance, Encoder encoder), the newInstance argument will already have the lastName set to be the same as the oldInstance so the encoder won't see their values as different and hence it does not serialize a property value for the lastName.

Custom State
The DefaultPersistenceDelegate assumes that the state of the oldInstance can be determined and restored by using the JavaBeans component model for properties. The list of properties for a class is retrieved using the method java.beans.Introspector.getBeanInfo(ClassaClass).getPropertyDescriptors(). Each property is an instance of java.beans.PropertyDescriptor and includes a get and set method. The Introspector uses a set of rules matching method name pairs to create properties, although these rules can be overridden by supplying a specific BeanInfo class. The BeanInfo class can use a different set of methods than those that the introspector would otherwise have determined as the property's get and set method. However, it can't deal with scenarios in which there is no get and set method, for example. For these the persistence delegate needs to be customized, and as an example we will have a property called nicknames that is multivalued.

private List nicknames = new ArrayList();
public void addNickname(String name){nicknames.add(name); }
public List getNicknames(){return nicknames; }

Nicknames are added to the class one at a time using the addNickname() method, and the complete list is retrieved using getNicknames(). The decoder needs to iterate through the nicknames and create an archive that uses the addNickname() method to re-create the Person.

The persistence delegate will subclass DefaultPersistenceDelegate that assumes construction of the class through a default Person, and will override the instantiate() method that's responsible for determining the expressions required to re-create the oldInstance (see Listing 2).

The persistence delegate iterates through the nicknames and for each one adds a statement to the encoder that specifies the API to re-create the nickname. For this the Statement includes the target of the method (the Person oldInstance), the method name (addNickname), and the arguments (the nickname) (see Listing 3).

Specifying Delegates in BeanInfo Classes
In the examples used so far the custom persistence delegate was set directly onto the XMLEncoder by calling the method setPersistenceDelegate(Class,PersistenceDelegate). This works if you're the author of the code that's responsible for performing the serialization, but in some scenarios another piece of software such as an IDE tool is responsible for encoding the JavaBeans. In this situation you must teach the tool about the delegate that it should use for your class; this is done by specifying the delegate class name in the BeanDescriptor for a string key of "persistenceDelegate". For example, if the Person class is going to be introduced into an IDE together with PersonBeanInfo, the getBeanDescriptor() method would be specialized.

public class PersonBeanInfo extends SimpleBeanInfo {
public BeanDescriptor getBeanDescriptor(){
BeanDescriptor result = new BeanDescriptor(Person.class);
result.setValue("persistenceDelegate", PersonPersistenceDelegate.class);
return result;
}
}

If the PersonBeanInfo is not in the same package as the Person class, the search path of the Introspector in the tool will need to be updated to include the BeanInfo's package.

Another way in which BeanInfo classes can be used to leverage persistence is by marking properties as transient. When DefaultPersistenceDelegate is responsible for encoding the JavaBean, it looks at all the available read/write properties and compares the existing values on the object being serialized against the values on the prototype instance. To flag a property so that it will be ignored, the key "transient" should be set to the value Boolean.TRUE. For example, if the "firstName" property should be considered transient, the getPropertyDescriptors() method on PersonBeanInfo could be specialized as shown in Listing 4.

Conclusion
This article explained how the design of the XMLEncoder avoids many of the fundamental pitfalls of binary serialization and makes the case that XML archives produced by the XMLEncoder can be trusted as a reliable means to store valuable data over the long term. Central to the design of the XMLEncoder is the java.beans.DefaultPersistenceDelegate class, which provides a default serialization strategy based on the idea of properties as laid out in a JavaBeans component model.

We show how custom delegates can be submitted to the encoder to teach it about idioms other than those of the JavaBeans component model, so classes that don't follow the JavaBeans conventions can be accommodated without changing their APIs. Because, in all cases, the decoder inflates object graphs using public API calls; deserialization is remarkably robust in the face of changes made to the classes referred to in the archives. If you need to save some critical data in your application to a file and are not interested in designing a new file format and coding the readers and writers for it - check out the XMLEncoder/XMLDecoder to see if they'll do it all for you.

References

  • Using XML Encoder on the Swing Connection: http://java.sun.com/products/jfc/tsc/ articles/persistence4/index.html
  • JavaBeans: http://java.sun.com/products/javabeans/
  • More Stories By Joe Winchester

    Joe Winchester, Editor-in-Chief of Java Developer's Journal, was formerly JDJ's longtime Desktop Technologies Editor and is a software developer working on development tools for IBM in Hursley, UK.

    More Stories By Philip Milne

    Philip Milne is a software developer who worked at Sun as part of the Swing development team, and now works as a consultant in London, UK. He can be contacted at Philip.Milne@drkw.com

    Comments (1) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    koundinya75 04/23/09 01:20:00 AM EDT

    This is nice presentation on XML serialization.

    I wonder how we can serialize the Composite Objects.

    For ex: If I have Department instance associated with Employee instance then Frameworks like JAXB or CASTOR are able to do right marshalling. But I am not seeing the same with XML serialization. Could you share some of your thoughts on this?