Welcome!

Java Authors: Roger Strukhoff, Pat Romanski, Elizabeth White, Yeshim Deniz, Jackie Kahle

Related Topics: Java

Java: Article

Zip Objects, Zap Wait Time

Zip Objects, Zap Wait Time

As the capabilities of our distributed applications increased, so did our consumption of bandwidth. In 1998, our server sent objects no larger than 50K to a group of users on a local network. By 2002, we were passing an average of 500K per object, with some as large as 1.5MB.

More important, the distribution of our user base grew from 50 to over 1,500, with some users based across the country from the server. Add in a group of users roaming on their modem connections and the full scale of our bandwidth issues become clear. We were presented with a problem faced by many developers of distributed systems: reduce bandwidth usage and client wait time without removing any functionality. This article shares our solution to this problem, providing you with the simple code that helped us eliminate over 80% of our network traffic.

Evaluating bandwidth is quite simple. The developer has two options: get more of it or use less of it. Given the magnitude and expense of expanding bandwidth on a nationally distributed application, it was clear we had to find ways to reduce the amount of bandwidth required by our systems. It's important to note the wording: reduce the bandwidth usage, not the amount of data passed over the network. To preserve the functionality of the systems, we needed all the data being passed over the line. In the end, there was one conclusion: the data needed to be compressed.

As I researched compression in Java, I was looking for a way to pass in an object and receive a compressed object back. I found that there are a number of ways to compress sockets or build zip files on the disk, but not the object-level solution I was seeking. We needed an API that could be selectively implemented and used for the largest data objects and most critical applications without impacting other parts of the system. We also wanted the ability to compress an object one time, and use that same object for multiple downloads to client machines, essentially caching a compressed object.

During this research, I found an article on compression on the Java developer's Web site that laid out all the pieces to our solution (see Resources section ). Using just a few of the classes in the java.io and java.util.zip packages, we were able to build an API to compress any serializable Java object. Being the kind of developer who prefers simplicity, I was excited at the ease of use and performance of the underlying Java classes as well as the API we built. We were able to develop and integrate our solution in just under two days, resulting in more than an 80% reduction in network traffic and astounding improvements in client wait times.

A Compression Factory for Serialized Objects
The Java compression functions are located in the java.util.zip package, where the Deflater class compresses byte arrays and the Inflater class decompresses byte arrays. As you may have noted, both of these classes perform compression routines on byte arrays. Therefore, to compress an object, the first step is to translate it into a representation of bytes, which begins with the Serializable interface.

When an object implements the Serializable interface, it can be represented as a stream of bytes. This byte stream can be written using the ObjectOutputStream.writeObject() method and reconstituted using the ObjectInputStream.readObject() method, allowing for a simple translation of a byte stream to and from an object. This ability to serialize an object, capturing the resulting byte stream into a byte array, provides a usable input for the compression methods available in the java.util.zip classes.

Using this approach, we will accept a serialized object, write the object into a byte array, and then compress the array. The array of compressed bytes, along with a few other key variables, will be stored in a new object, cZipObject, which is shown in its entirety in Listing 1. The cZipObject will encapsulate the compressed version of the input object. The cZipObject can then be serialized to transfer across the network. On the receiving end, the byte array will be extracted from the cZipObject, decompressed, input to a byte stream, and then reconstituted into an object. This process is not truly compressing the object, but compressing the serialized representation of the object and its data.

To easily integrate these compression routines on both the server and client side, we'll create a cZipFactory class that will contain all the methods for compressing and decompressing objects. We'll create a number of methods along the way that can be of direct use, such as a byte compression method. By encapsulating both the compress and decompress functions into a single class, we can add the functionality to both the client and server by creating a single object. This will allow us to compress objects sent from the server to the client as well as from the client back up to the server.

The first step is to convert the Serializable object into a byte array. This can be achieved by using the Object- OutputStream with an underlying ByteArrayOutputStream from the java.io package. First, we'll create a new ByteOutput Stream that will capture the byte stream when the object is written. We'll then create a new ObjectOutputStream, write the serialized object, and then extract a byte array from the ByteOutputStream.

try {
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
ObjectOutputStream objOut = new ObjectOutputStream(byteOut);
objOut.writeObject(inObj);
byte[] DataArray = byteOut.toByteArray();
} catch (Exception e) {
System.out.println(e.getMessage());
}

With this code, we now have the ability to translate any object that implements the Serializable interface into a byte array capable of compression. The resulting byte array contains the details of the object as well as the object's data. The array contains the essential structural and data attributes to replicate the object and all its content. The next step is to compress the data contained in the byte array, thereby compressing the serialized representation of the object.

There are a few simple steps to compressing byte arrays using the Deflater class from the java.util.zip package. First, we'll create a new array for the compressed bytes. Without a method to accurately predict or estimate the size of the byte array resulting from compression, it's advisable to create an array of equal size to the noncompressed bytes and then shrink the array once the compression is complete and the true size can be determined.

The next step is to create a new instance of the Deflater class, passing in the desired compression level in the constructor. There are a few options for compression level, each with benefits and drawbacks. The best compression option provides the greatest reduction in byte size at the expense of increased processing time. The best speed option provides a good compression level, usually 80% or better, in the shortest possible time. I usually opt for best compression, finding the extra milliseconds in processing time worth the decreased object size. For more information on the available compression levels, refer to the JavaDocs for java.util.zip.Deflater.

Once the Deflater object has been created, call the setInput(byte[]) method providing the byte array we extracted from the object serialization. Invoke the finish() method to inform the Deflater class that all inputs have been defined. Next, call the deflate(byte[]) method, providing the byte array to house the compressed data. When this method completes its execution, the data has been compressed and populated in the output byte array. The getTotalOut() method in the Deflater class will return the total number of bytes that were written in the output byte array. Using the new array size, we'll create a byte array to the exact size of the compressed output. We'll then use the System.arraycopy function to copy the bytes from the temporary array into the exact size array.

For ease of use, we'll encapsulate these steps into a single method named CompressBytes in the cZipFactory object (see Listing 2). Now, when we need to compress a byte array, we can invoke a single method:

byte[] bytesCompress = ZipFactory.CompressBytes(DataArray);

There are two key pieces of data required to quickly and accurately decompress the object: the byte array containing the compressed data and the original size of the serialized byte array. When the byte array is decompressed, it will be written into another byte array. Knowing the size of the decompressed array will not only make the decompression more efficient, it will also ensure accuracy. To save the byte array and original size easily, we will encapsulate them in a new instance of the cZipObject class.

cZipObject cZipObj = new cZipObject();
cZipObj.setData(bytesCompress, iOrigSize);

By combining all these steps, we can now create a method that accepts any Serializable object and returns a cZipObject. This is the Compress method in the cZipFactory class, shown in its entirety in Listing 3. Using the new method in cZipFactory greatly simplifies the integration of object compression functions. First, we create an instance of the cZipFactory class, providing the desired compression level during object creation.

cZipFactory ZipFactory = new
cZipFactory(java.util.zip.Deflater.BEST_COMPRESSION);

Using the new cZipFactory class, we can compress a serializable object using a single line of code:

cZipObject newZObject = ZipFactory.Compress(inObject);

When the client or receiving machine obtains the cZipObject, it needs to be decompressed and reconstituted into an object. To achieve this, we'll create another method in cZipFactory to handle the Decompress operation. This method will extract the byte array from the provided cZipObject, decompress the array, and then translate the bytes into an object. The Decompress method in cZipFactory will return a Serializable object, which can be cast into the original type of object.

Using the java.util.zip.Inflater class, we can easily decompress the byte array in a few lines of code. Given the compressed byte array and the original size of the byte array, the Inflater class can be used to decompress the byte array. As this function could be useful in a variety of situations, we'll create a method in the cZipFactory class named DecompressBytes. The method will accept a byte array containing the compressed bytes and a primitive integer for the size of the decompressed array. At this point, it's very important that we know the original size of the byte array (see Listing 4). Without this information, it wouldn't be possible to accurately predict the total size of the decompressed bytes without extracting the data in a loop. Knowing the original size of the byte array makes the decompression code easier and more efficient.

With the ability to decompress a byte array in place, we then move to the process of converting the bytes back into a usable object using an instance of ObjectInputStream. First, we'll create a ByteArrayInputStream using the decompress byte array. Using the byte stream, we'll construct a new ObjectInputStream to reconstitute the object. By invoking the readObject method, the ObjectInputStream will translate the byte stream into a usable object. To simplify our coding, we'll place this code in a method named ConvertByteToObject in the cZipFactory class (see Listing 5).

The final step is to create a Decompress method in the cZipFactory class that will accept a cZipObject and return a Serializable object. The completed Decompress method is shown in the cZipFactory class in Listing 3.

Using the cZipFactory class, we can now decompress a cZipObject using a single line of code:

Serializable retObject = ZipFactory.Decompress(newZObject);

The Serializable object can then be cast into its original form or in the same line of code as the call to Decompress:

Vector vClientList = (Vector)ZipFactory.Decompress(newZObject);

In the end, the cZipFactory provides easy-to-use methods that translate serializable objects to and from compressed representations of objects. The entire compression API can be quickly implemented in just a few lines of code. Another important feature is the ability to use the function selectively rather than a system-wide change, such as compressing a socket. The resulting cZipObject can be extended or expanded to meet the requirements of an application or can be treated like any other Java object. This also allows for the reuse of a cZipObject, allowing the developer to cache a compressed object, effectively eliminating the need to redundantly perform compressions.

A Simple Client List Example
Now that we've built the classes to compress Serializable objects, we'll work through an example using the new objects. To begin, let's create a vector of client names. For our example, we'll create a vector with generic content, but you could imagine this list of clients being derived from a database call, an XML document, or some other data source.

Vector vClients = new Vector(1000);
for (int i = 0; i < 1000; i++)
vClients.add("Client # " + i);

The resulting vector, vClients, contains 1,000 entries and when serialized is 14,046 bytes. If the client machine connects using a 28.8 modem, they will retrieve this vector at approximately 3.33 KBS. At this throughput rate, it'll take the client machine approximately 4,200 milliseconds to download this list of 1,000 clients. If we wanted to add in compression, we'd add this line of code on the server:

//Using a pre-existing cZipFactory class instance
cZipObject zoClients = ZipFactory.Compress(vClients);

On the client machine, we add this line of code to decompress the cZipObject:

//Using a pre-existing cZipFactory class instance
Vector vClients = (Vector)ZipFactory.Decompress(zoClients);

Using this example, the Compress method executes in approximately 40 milliseconds. We would then transmit the zoClients object to the client machine, which when serialized is 2,296 bytes. At 28.8 modem speed, the cZipObject instance is downloaded to the client in approximately 690 milliseconds. The client then decompresses the cZipObject, casting the contents into a vector. The Decompression operation on the client takes an additional 30 milliseconds. The total time using compression was 40 + 690 + 30 = 760 milliseconds. When compared to the original download time of 4,200 milliseconds, the compression technique saved 3,440 milliseconds of client wait time and reduced the total object size by 11,750 bytes, resulting in 83.6% less bandwidth consumption. This is more than five times faster and is achieved with a few simple lines of code on the server and client.

Listing 6 provides a simple testing class that was used for this example and the benchmarks quoted in this article. By using this simple testing class, you can see that when applied to larger data structures, the compression functions make a more profound impact on bandwidth reduction and client wait times.

Expense of Compression
There are two primary expenses to this compression technique: increased memory usage and CPU cycles. This approach is compressing the serialized representation of an object, which requires that the object be serialized into an array that's then compressed and included in another serializable object. In addition to the increase in memory usage, there will be an increase in CPU utilization. The compression routines are comprised of arithmetic operations, which will result in increased CPU usage during deflation and inflation processing. For larger installations of these compression routines, it would be reasonable to expect notable increases in server CPU usage, which would need to be analyzed in terms of frequency and the size of the objects being compressed. As a benchmark, in one installation the server processed approximately 10,000 compressions an hour on objects ranging from 10K to 350K. The addition of compression functions resulted in approximately a 3% increase in CPU usage.

Another important factor to remember is that the client machines will also have increases in memory usage and CPU utilization to decompress the objects, or compress objects being sent to the server. The speed of these decompression routines will depend on the client machine hardware.

Conclusion
If you are writing distributed Java applications, whether they're EJB systems or custom RMI solutions, the introduction of compression routines can provide tremendous improvements to the response time and bandwidth consumption of your programs. One of the primary advantages to the approach presented here is its simplicity, allowing the developer to continually work with objects and avoid the compression functions. Using the cZipFactory also allows the developer to avoid socket-level operations or the creation of disk files, retaining the structure of existing programs and making it possible to selectively implement the functions. Another benefit of the cZipFactory is the use of standard Java libraries, making the compression function available in both J2SE and J2EE applications.

For our applications, the performance of the compression routines has been excellent, with minimal server impact and network usage down by 85%. Today, of the approximately 3,000 client machines using the compression classes, there have been no reports of problems with CPU utilization or memory usage. Overall, the introduction of compression was the single largest performance improvement made in our five year development effort.

Resources

  • "Compressing and Decompressing Data Using Java APIs": http://developer.java.sun.com/developer/technicalArticles/Programming/compression/
  • Object Serialization in Java: http://java.sun.com/j2se/1.4.2/docs/guide/serialization/
  • Java Documentation for java.util.zip package: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html
  • Java Documentation for java.io package: http://java.sun.com/j2se/1.4.2/docs/api/java/io/package-summary.html

    SIDEBAR
    Calculating the Benefits of Compression

    There are a number of benefits to using serialized object compression, most notably the reduction in the size of the serialized output. The performance gain is directly related to the average object size, the bandwidth of the client connections, and the CPU processing power of the server and client machines. When determining whether to implement a compression function, these factors should be projected in order to ensure a positive gain. Consider this simple equation to determine if compression routines would be beneficial:

    [(Object Size bytes) × 8] ÷ [Line Speed kbs] = Avg. Download Time (ms)

    [10000 × 8] ÷ 128 = 625 ms

    Now, reduce the average object size by 80% and recalculate the download time; this time add an additional 100 milliseconds for processing time.

    [(Object Size bytes × 8 × 0.2)] ÷ [Line Speed kbs] + 100 = Compress Download Time (ms)

    {[(10000 × 8) × 0.2] ÷ 128 } + 100 = 225 ms

    In the chart in Figure 1 we see how the slight increase in processing time required for compression can create tremendous gains in download time.

    Regardless of the bandwidth from the server to the client, compression routines will have a definitive impact on network usage (see Figure 2).

    It's important to remember that at some point the law of diminishing returns becomes prevalent. For example, if the average size of the object before compression is 5,000 bytes, then compression could reduce this to as little as 1,000 bytes. The total expense of this compression would be about 100 milliseconds. If the client machines were on 28.8 modems, the compression would have a positive impact, reducing client wait time by about 1,100 milliseconds. However, if the client machines were on 512K connections, downloading the original 5,000 bytes would only take about 90 milliseconds. Even though the 1,000 bytes would take 17 milliseconds, we have now added additional processing time for the compress and decompress operations, potentially creating a negative return, and not significantly impacting download time.

    The chart in Figure 3 helps to illustrate how the benefits of compression on client wait time can be quickly reduced in higher bandwidth environments. It's important to note that while client wait time may not be significantly reduced by compression, network traffic will always be reduced. Even though the end user may not notice improvements, the network will always benefit from the reduction in throughput.

  • More Stories By Robert Beckett

    Robert Beckett is the Chief Architect for The Software Development Cooperative. Robert is currently working on two Java products: an extensive API set for Java developers www.thesdc.com\basesys\ and a high-performance Java RMI server www.thesdc.com\symtier\.

    Comments (2) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    Robert Beckett 11/08/03 04:02:05 PM EST

    I would like to thank all of the developers for their supportive, positive, and constructive feedback. A number of developers have been using the zip classes from the article and are having great success; however, a few errors have been flushed out, namely with trying to zip small or empty objects. There are also some of us working on enhancing the cZipFactory to use streams, such as those from the java.util.zip package. We have created a web page with these corrections and ongoing updates. If anyone is having problems or has update suggestions, please check out the page at http://www.thesdc.com/basesys/zipzap_updates.html or e-mail me at [email protected]. Again, thanks for the kind words and great coding contributions!

    10/24/03 03:34:05 PM EDT

    I've been disappointed of late with the JDJ content (less technical, more fluff). However this article brings back memories of past technical, how-to JDJ articles that were the norm rather than the exception. Kudos to Robert Beckett for his excellent article and utility. Let's hope this article starts a trend at JDJ!