Welcome!

Java IoT Authors: Pat Romanski, Gregor Petri, Elizabeth White, Liz McMillan, Jnan Dash

Related Topics: Java IoT

Java IoT: Article

Zip Objects, Zap Wait Time

Zip Objects, Zap Wait Time

As the capabilities of our distributed applications increased, so did our consumption of bandwidth. In 1998, our server sent objects no larger than 50K to a group of users on a local network. By 2002, we were passing an average of 500K per object, with some as large as 1.5MB.

More important, the distribution of our user base grew from 50 to over 1,500, with some users based across the country from the server. Add in a group of users roaming on their modem connections and the full scale of our bandwidth issues become clear. We were presented with a problem faced by many developers of distributed systems: reduce bandwidth usage and client wait time without removing any functionality. This article shares our solution to this problem, providing you with the simple code that helped us eliminate over 80% of our network traffic.

Evaluating bandwidth is quite simple. The developer has two options: get more of it or use less of it. Given the magnitude and expense of expanding bandwidth on a nationally distributed application, it was clear we had to find ways to reduce the amount of bandwidth required by our systems. It's important to note the wording: reduce the bandwidth usage, not the amount of data passed over the network. To preserve the functionality of the systems, we needed all the data being passed over the line. In the end, there was one conclusion: the data needed to be compressed.

As I researched compression in Java, I was looking for a way to pass in an object and receive a compressed object back. I found that there are a number of ways to compress sockets or build zip files on the disk, but not the object-level solution I was seeking. We needed an API that could be selectively implemented and used for the largest data objects and most critical applications without impacting other parts of the system. We also wanted the ability to compress an object one time, and use that same object for multiple downloads to client machines, essentially caching a compressed object.

During this research, I found an article on compression on the Java developer's Web site that laid out all the pieces to our solution (see Resources section ). Using just a few of the classes in the java.io and java.util.zip packages, we were able to build an API to compress any serializable Java object. Being the kind of developer who prefers simplicity, I was excited at the ease of use and performance of the underlying Java classes as well as the API we built. We were able to develop and integrate our solution in just under two days, resulting in more than an 80% reduction in network traffic and astounding improvements in client wait times.

A Compression Factory for Serialized Objects
The Java compression functions are located in the java.util.zip package, where the Deflater class compresses byte arrays and the Inflater class decompresses byte arrays. As you may have noted, both of these classes perform compression routines on byte arrays. Therefore, to compress an object, the first step is to translate it into a representation of bytes, which begins with the Serializable interface.

When an object implements the Serializable interface, it can be represented as a stream of bytes. This byte stream can be written using the ObjectOutputStream.writeObject() method and reconstituted using the ObjectInputStream.readObject() method, allowing for a simple translation of a byte stream to and from an object. This ability to serialize an object, capturing the resulting byte stream into a byte array, provides a usable input for the compression methods available in the java.util.zip classes.

Using this approach, we will accept a serialized object, write the object into a byte array, and then compress the array. The array of compressed bytes, along with a few other key variables, will be stored in a new object, cZipObject, which is shown in its entirety in Listing 1. The cZipObject will encapsulate the compressed version of the input object. The cZipObject can then be serialized to transfer across the network. On the receiving end, the byte array will be extracted from the cZipObject, decompressed, input to a byte stream, and then reconstituted into an object. This process is not truly compressing the object, but compressing the serialized representation of the object and its data.

To easily integrate these compression routines on both the server and client side, we'll create a cZipFactory class that will contain all the methods for compressing and decompressing objects. We'll create a number of methods along the way that can be of direct use, such as a byte compression method. By encapsulating both the compress and decompress functions into a single class, we can add the functionality to both the client and server by creating a single object. This will allow us to compress objects sent from the server to the client as well as from the client back up to the server.

The first step is to convert the Serializable object into a byte array. This can be achieved by using the Object- OutputStream with an underlying ByteArrayOutputStream from the java.io package. First, we'll create a new ByteOutput Stream that will capture the byte stream when the object is written. We'll then create a new ObjectOutputStream, write the serialized object, and then extract a byte array from the ByteOutputStream.

try {
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
ObjectOutputStream objOut = new ObjectOutputStream(byteOut);
objOut.writeObject(inObj);
byte[] DataArray = byteOut.toByteArray();
} catch (Exception e) {
System.out.println(e.getMessage());
}

With this code, we now have the ability to translate any object that implements the Serializable interface into a byte array capable of compression. The resulting byte array contains the details of the object as well as the object's data. The array contains the essential structural and data attributes to replicate the object and all its content. The next step is to compress the data contained in the byte array, thereby compressing the serialized representation of the object.

There are a few simple steps to compressing byte arrays using the Deflater class from the java.util.zip package. First, we'll create a new array for the compressed bytes. Without a method to accurately predict or estimate the size of the byte array resulting from compression, it's advisable to create an array of equal size to the noncompressed bytes and then shrink the array once the compression is complete and the true size can be determined.

The next step is to create a new instance of the Deflater class, passing in the desired compression level in the constructor. There are a few options for compression level, each with benefits and drawbacks. The best compression option provides the greatest reduction in byte size at the expense of increased processing time. The best speed option provides a good compression level, usually 80% or better, in the shortest possible time. I usually opt for best compression, finding the extra milliseconds in processing time worth the decreased object size. For more information on the available compression levels, refer to the JavaDocs for java.util.zip.Deflater.

Once the Deflater object has been created, call the setInput(byte[]) method providing the byte array we extracted from the object serialization. Invoke the finish() method to inform the Deflater class that all inputs have been defined. Next, call the deflate(byte[]) method, providing the byte array to house the compressed data. When this method completes its execution, the data has been compressed and populated in the output byte array. The getTotalOut() method in the Deflater class will return the total number of bytes that were written in the output byte array. Using the new array size, we'll create a byte array to the exact size of the compressed output. We'll then use the System.arraycopy function to copy the bytes from the temporary array into the exact size array.

For ease of use, we'll encapsulate these steps into a single method named CompressBytes in the cZipFactory object (see Listing 2). Now, when we need to compress a byte array, we can invoke a single method:

byte[] bytesCompress = ZipFactory.CompressBytes(DataArray);

There are two key pieces of data required to quickly and accurately decompress the object: the byte array containing the compressed data and the original size of the serialized byte array. When the byte array is decompressed, it will be written into another byte array. Knowing the size of the decompressed array will not only make the decompression more efficient, it will also ensure accuracy. To save the byte array and original size easily, we will encapsulate them in a new instance of the cZipObject class.

cZipObject cZipObj = new cZipObject();
cZipObj.setData(bytesCompress, iOrigSize);

By combining all these steps, we can now create a method that accepts any Serializable object and returns a cZipObject. This is the Compress method in the cZipFactory class, shown in its entirety in Listing 3. Using the new method in cZipFactory greatly simplifies the integration of object compression functions. First, we create an instance of the cZipFactory class, providing the desired compression level during object creation.

cZipFactory ZipFactory = new
cZipFactory(java.util.zip.Deflater.BEST_COMPRESSION);

Using the new cZipFactory class, we can compress a serializable object using a single line of code:

cZipObject newZObject = ZipFactory.Compress(inObject);

When the client or receiving machine obtains the cZipObject, it needs to be decompressed and reconstituted into an object. To achieve this, we'll create another method in cZipFactory to handle the Decompress operation. This method will extract the byte array from the provided cZipObject, decompress the array, and then translate the bytes into an object. The Decompress method in cZipFactory will return a Serializable object, which can be cast into the original type of object.

Using the java.util.zip.Inflater class, we can easily decompress the byte array in a few lines of code. Given the compressed byte array and the original size of the byte array, the Inflater class can be used to decompress the byte array. As this function could be useful in a variety of situations, we'll create a method in the cZipFactory class named DecompressBytes. The method will accept a byte array containing the compressed bytes and a primitive integer for the size of the decompressed array. At this point, it's very important that we know the original size of the byte array (see Listing 4). Without this information, it wouldn't be possible to accurately predict the total size of the decompressed bytes without extracting the data in a loop. Knowing the original size of the byte array makes the decompression code easier and more efficient.

With the ability to decompress a byte array in place, we then move to the process of converting the bytes back into a usable object using an instance of ObjectInputStream. First, we'll create a ByteArrayInputStream using the decompress byte array. Using the byte stream, we'll construct a new ObjectInputStream to reconstitute the object. By invoking the readObject method, the ObjectInputStream will translate the byte stream into a usable object. To simplify our coding, we'll place this code in a method named ConvertByteToObject in the cZipFactory class (see Listing 5).

The final step is to create a Decompress method in the cZipFactory class that will accept a cZipObject and return a Serializable object. The completed Decompress method is shown in the cZipFactory class in Listing 3.

Using the cZipFactory class, we can now decompress a cZipObject using a single line of code:

Serializable retObject = ZipFactory.Decompress(newZObject);

The Serializable object can then be cast into its original form or in the same line of code as the call to Decompress:

Vector vClientList = (Vector)ZipFactory.Decompress(newZObject);

In the end, the cZipFactory provides easy-to-use methods that translate serializable objects to and from compressed representations of objects. The entire compression API can be quickly implemented in just a few lines of code. Another important feature is the ability to use the function selectively rather than a system-wide change, such as compressing a socket. The resulting cZipObject can be extended or expanded to meet the requirements of an application or can be treated like any other Java object. This also allows for the reuse of a cZipObject, allowing the developer to cache a compressed object, effectively eliminating the need to redundantly perform compressions.

A Simple Client List Example
Now that we've built the classes to compress Serializable objects, we'll work through an example using the new objects. To begin, let's create a vector of client names. For our example, we'll create a vector with generic content, but you could imagine this list of clients being derived from a database call, an XML document, or some other data source.

Vector vClients = new Vector(1000);
for (int i = 0; i < 1000; i++)
vClients.add("Client # " + i);

The resulting vector, vClients, contains 1,000 entries and when serialized is 14,046 bytes. If the client machine connects using a 28.8 modem, they will retrieve this vector at approximately 3.33 KBS. At this throughput rate, it'll take the client machine approximately 4,200 milliseconds to download this list of 1,000 clients. If we wanted to add in compression, we'd add this line of code on the server:

//Using a pre-existing cZipFactory class instance
cZipObject zoClients = ZipFactory.Compress(vClients);

On the client machine, we add this line of code to decompress the cZipObject:

//Using a pre-existing cZipFactory class instance
Vector vClients = (Vector)ZipFactory.Decompress(zoClients);

Using this example, the Compress method executes in approximately 40 milliseconds. We would then transmit the zoClients object to the client machine, which when serialized is 2,296 bytes. At 28.8 modem speed, the cZipObject instance is downloaded to the client in approximately 690 milliseconds. The client then decompresses the cZipObject, casting the contents into a vector. The Decompression operation on the client takes an additional 30 milliseconds. The total time using compression was 40 + 690 + 30 = 760 milliseconds. When compared to the original download time of 4,200 milliseconds, the compression technique saved 3,440 milliseconds of client wait time and reduced the total object size by 11,750 bytes, resulting in 83.6% less bandwidth consumption. This is more than five times faster and is achieved with a few simple lines of code on the server and client.

Listing 6 provides a simple testing class that was used for this example and the benchmarks quoted in this article. By using this simple testing class, you can see that when applied to larger data structures, the compression functions make a more profound impact on bandwidth reduction and client wait times.

Expense of Compression
There are two primary expenses to this compression technique: increased memory usage and CPU cycles. This approach is compressing the serialized representation of an object, which requires that the object be serialized into an array that's then compressed and included in another serializable object. In addition to the increase in memory usage, there will be an increase in CPU utilization. The compression routines are comprised of arithmetic operations, which will result in increased CPU usage during deflation and inflation processing. For larger installations of these compression routines, it would be reasonable to expect notable increases in server CPU usage, which would need to be analyzed in terms of frequency and the size of the objects being compressed. As a benchmark, in one installation the server processed approximately 10,000 compressions an hour on objects ranging from 10K to 350K. The addition of compression functions resulted in approximately a 3% increase in CPU usage.

Another important factor to remember is that the client machines will also have increases in memory usage and CPU utilization to decompress the objects, or compress objects being sent to the server. The speed of these decompression routines will depend on the client machine hardware.

Conclusion
If you are writing distributed Java applications, whether they're EJB systems or custom RMI solutions, the introduction of compression routines can provide tremendous improvements to the response time and bandwidth consumption of your programs. One of the primary advantages to the approach presented here is its simplicity, allowing the developer to continually work with objects and avoid the compression functions. Using the cZipFactory also allows the developer to avoid socket-level operations or the creation of disk files, retaining the structure of existing programs and making it possible to selectively implement the functions. Another benefit of the cZipFactory is the use of standard Java libraries, making the compression function available in both J2SE and J2EE applications.

For our applications, the performance of the compression routines has been excellent, with minimal server impact and network usage down by 85%. Today, of the approximately 3,000 client machines using the compression classes, there have been no reports of problems with CPU utilization or memory usage. Overall, the introduction of compression was the single largest performance improvement made in our five year development effort.

Resources

  • "Compressing and Decompressing Data Using Java APIs": http://developer.java.sun.com/developer/technicalArticles/Programming/compression/
  • Object Serialization in Java: http://java.sun.com/j2se/1.4.2/docs/guide/serialization/
  • Java Documentation for java.util.zip package: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html
  • Java Documentation for java.io package: http://java.sun.com/j2se/1.4.2/docs/api/java/io/package-summary.html

    SIDEBAR
    Calculating the Benefits of Compression

    There are a number of benefits to using serialized object compression, most notably the reduction in the size of the serialized output. The performance gain is directly related to the average object size, the bandwidth of the client connections, and the CPU processing power of the server and client machines. When determining whether to implement a compression function, these factors should be projected in order to ensure a positive gain. Consider this simple equation to determine if compression routines would be beneficial:

    [(Object Size bytes) × 8] ÷ [Line Speed kbs] = Avg. Download Time (ms)

    [10000 × 8] ÷ 128 = 625 ms

    Now, reduce the average object size by 80% and recalculate the download time; this time add an additional 100 milliseconds for processing time.

    [(Object Size bytes × 8 × 0.2)] ÷ [Line Speed kbs] + 100 = Compress Download Time (ms)

    {[(10000 × 8) × 0.2] ÷ 128 } + 100 = 225 ms

    In the chart in Figure 1 we see how the slight increase in processing time required for compression can create tremendous gains in download time.

    Regardless of the bandwidth from the server to the client, compression routines will have a definitive impact on network usage (see Figure 2).

    It's important to remember that at some point the law of diminishing returns becomes prevalent. For example, if the average size of the object before compression is 5,000 bytes, then compression could reduce this to as little as 1,000 bytes. The total expense of this compression would be about 100 milliseconds. If the client machines were on 28.8 modems, the compression would have a positive impact, reducing client wait time by about 1,100 milliseconds. However, if the client machines were on 512K connections, downloading the original 5,000 bytes would only take about 90 milliseconds. Even though the 1,000 bytes would take 17 milliseconds, we have now added additional processing time for the compress and decompress operations, potentially creating a negative return, and not significantly impacting download time.

    The chart in Figure 3 helps to illustrate how the benefits of compression on client wait time can be quickly reduced in higher bandwidth environments. It's important to note that while client wait time may not be significantly reduced by compression, network traffic will always be reduced. Even though the end user may not notice improvements, the network will always benefit from the reduction in throughput.

  • More Stories By Robert Beckett

    Robert Beckett is the Chief Architect for The Software Development Cooperative. Robert is currently working on two Java products: an extensive API set for Java developers www.thesdc.com\basesys\ and a high-performance Java RMI server www.thesdc.com\symtier\.

    Comments (2) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    Robert Beckett 11/08/03 04:02:05 PM EST

    I would like to thank all of the developers for their supportive, positive, and constructive feedback. A number of developers have been using the zip classes from the article and are having great success; however, a few errors have been flushed out, namely with trying to zip small or empty objects. There are also some of us working on enhancing the cZipFactory to use streams, such as those from the java.util.zip package. We have created a web page with these corrections and ongoing updates. If anyone is having problems or has update suggestions, please check out the page at http://www.thesdc.com/basesys/zipzap_updates.html or e-mail me at [email protected]. Again, thanks for the kind words and great coding contributions!

    10/24/03 03:34:05 PM EDT

    I've been disappointed of late with the JDJ content (less technical, more fluff). However this article brings back memories of past technical, how-to JDJ articles that were the norm rather than the exception. Kudos to Robert Beckett for his excellent article and utility. Let's hope this article starts a trend at JDJ!

    @ThingsExpo Stories
    If you’re responsible for an application that depends on the data or functionality of various IoT endpoints – either sensors or devices – your brand reputation depends on the security, reliability, and compliance of its many integrated parts. If your application fails to deliver the expected business results, your customers and partners won't care if that failure stems from the code you developed or from a component that you integrated. What can you do to ensure that the endpoints work as expect...
    SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit http://www.pulzzesystems.com.
    The Internet of Things can drive efficiency for airlines and airports. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Sudip Majumder, senior director of development at Oracle, will discuss the technical details of the connected airline baggage and related social media solutions. These IoT applications will enhance travelers' journey experience and drive efficiency for the airlines and the airports. The session will include a working demo and a technical d...
    SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management solutions, helping companies worldwide activate their data to drive more value and business insight and to transform moder...
    The Transparent Cloud-computing Consortium (abbreviation: T-Cloud Consortium) will conduct research activities into changes in the computing model as a result of collaboration between "device" and "cloud" and the creation of new value and markets through organic data processing High speed and high quality networks, and dramatic improvements in computer processing capabilities, have greatly changed the nature of applications and made the storing and processing of data on the network commonplace.
    SYS-CON Events announced today that Bsquare has been named “Silver Sponsor” of SYS-CON's @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. For more than two decades, Bsquare has helped its customers extract business value from a broad array of physical assets by making them intelligent, connecting them, and using the data they generate to optimize business processes.
    I'm a lonely sensor. I spend all day telling the world how I'm feeling, but none of the other sensors seem to care. I want to be connected. I want to build relationships with other sensors to be more useful for my human. I want my human to understand that when my friends next door are too hot for a while, I'll soon be flaming. And when all my friends go outside without me, I may be left behind. Don't just log my data; use the relationship graph. In his session at @ThingsExpo, Ryan Boyd, Engi...
    Fact is, enterprises have significant legacy voice infrastructure that’s costly to replace with pure IP solutions. How can we bring this analog infrastructure into our shiny new cloud applications? There are proven methods to bind both legacy voice applications and traditional PSTN audio into cloud-based applications and services at a carrier scale. Some of the most successful implementations leverage WebRTC, WebSockets, SIP and other open source technologies. In his session at @ThingsExpo, Da...
    Almost two-thirds of companies either have or soon will have IoT as the backbone of their business in 2016. However, IoT is far more complex than most firms expected. How can you not get trapped in the pitfalls? In his session at @ThingsExpo, Tony Shan, a renowned visionary and thought leader, will introduce a holistic method of IoTification, which is the process of IoTifying the existing technology and business models to adopt and leverage IoT. He will drill down to the components in this fra...
    The vision of a connected smart home is becoming reality with the application of integrated wireless technologies in devices and appliances. The use of standardized and TCP/IP networked wireless technologies in line-powered and battery operated sensors and controls has led to the adoption of radios in the 2.4GHz band, including Wi-Fi, BT/BLE and 802.15.4 applied ZigBee and Thread. This is driving the need for robust wireless coexistence for multiple radios to ensure throughput performance and th...
    Enterprise IT has been in the era of Hybrid Cloud for some time now. But it seems most conversations about Hybrid are focused on integrating AWS, Microsoft Azure, or Google ECM into existing on-premises systems. Where is all the Private Cloud? What do technology providers need to do to make their offerings more compelling? How should enterprise IT executives and buyers define their focus, needs, and roadmap, and communicate that clearly to the providers?
    SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
    There is little doubt that Big Data solutions will have an increasing role in the Enterprise IT mainstream over time. Big Data at Cloud Expo - to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA - has announced its Call for Papers is open. Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is...
    DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
    Digital innovation is the next big wave of business transformation based on digital technologies of which IoT and Big Data are key components, For example: Business boundary innovation is a challenge to excavate third-party business value using IoT and BigData, like Nest Business structure innovation may propose re-building business structure from scratch, as Uber does in the taxicab industry The social model innovation is also a big challenge to the new social architecture with the design fr...
    Data is an unusual currency; it is not restricted by the same transactional limitations as money or people. In fact, the more that you leverage your data across multiple business use cases, the more valuable it becomes to the organization. And the same can be said about the organization’s analytics. In his session at 19th Cloud Expo, Bill Schmarzo, CTO for the Big Data Practice at EMC, will introduce a methodology for capturing, enriching and sharing data (and analytics) across the organizati...
    IoT is fundamentally transforming the auto industry, turning the vehicle into a hub for connected services, including safety, infotainment and usage-based insurance. Auto manufacturers – and businesses across all verticals – have built an entire ecosystem around the Connected Car, creating new customer touch points and revenue streams. In his session at @ThingsExpo, Macario Namie, Head of IoT Strategy at Cisco Jasper, will share real-world examples of how IoT transforms the car from a static p...
    The many IoT deployments around the world are busy integrating smart devices and sensors into their enterprise IT infrastructures. Yet all of this technology – and there are an amazing number of choices – is of no use without the software to gather, communicate, and analyze the new data flows. Without software, there is no IT. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the protocols that communicate data and the emerging data analy...
    SYS-CON Events announced today that China Unicom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. China United Network Communications Group Co. Ltd ("China Unicom") was officially established in 2009 on the basis of the merger of former China Netcom and former China Unicom. China Unicom mainly operates a full range of telecommunications services including mobile broadband (GSM, WCDMA, LTE F...
    Video experiences should be unique and exciting! But that doesn’t mean you need to patch all the pieces yourself. Users demand rich and engaging experiences and new ways to connect with you. But creating robust video applications at scale can be complicated, time-consuming and expensive. In his session at @ThingsExpo, Zohar Babin, Vice President of Platform, Ecosystem and Community at Kaltura, will discuss how VPaaS enables you to move fast, creating scalable video experiences that reach your...