Welcome!

Java IoT Authors: Tim Hinds, Pat Romanski, Jayaram Krishnaswamy, Liz McMillan, Jonathan Fries

Related Topics: Java IoT

Java IoT: Article

Zip Objects, Zap Wait Time

Zip Objects, Zap Wait Time

As the capabilities of our distributed applications increased, so did our consumption of bandwidth. In 1998, our server sent objects no larger than 50K to a group of users on a local network. By 2002, we were passing an average of 500K per object, with some as large as 1.5MB.

More important, the distribution of our user base grew from 50 to over 1,500, with some users based across the country from the server. Add in a group of users roaming on their modem connections and the full scale of our bandwidth issues become clear. We were presented with a problem faced by many developers of distributed systems: reduce bandwidth usage and client wait time without removing any functionality. This article shares our solution to this problem, providing you with the simple code that helped us eliminate over 80% of our network traffic.

Evaluating bandwidth is quite simple. The developer has two options: get more of it or use less of it. Given the magnitude and expense of expanding bandwidth on a nationally distributed application, it was clear we had to find ways to reduce the amount of bandwidth required by our systems. It's important to note the wording: reduce the bandwidth usage, not the amount of data passed over the network. To preserve the functionality of the systems, we needed all the data being passed over the line. In the end, there was one conclusion: the data needed to be compressed.

As I researched compression in Java, I was looking for a way to pass in an object and receive a compressed object back. I found that there are a number of ways to compress sockets or build zip files on the disk, but not the object-level solution I was seeking. We needed an API that could be selectively implemented and used for the largest data objects and most critical applications without impacting other parts of the system. We also wanted the ability to compress an object one time, and use that same object for multiple downloads to client machines, essentially caching a compressed object.

During this research, I found an article on compression on the Java developer's Web site that laid out all the pieces to our solution (see Resources section ). Using just a few of the classes in the java.io and java.util.zip packages, we were able to build an API to compress any serializable Java object. Being the kind of developer who prefers simplicity, I was excited at the ease of use and performance of the underlying Java classes as well as the API we built. We were able to develop and integrate our solution in just under two days, resulting in more than an 80% reduction in network traffic and astounding improvements in client wait times.

A Compression Factory for Serialized Objects
The Java compression functions are located in the java.util.zip package, where the Deflater class compresses byte arrays and the Inflater class decompresses byte arrays. As you may have noted, both of these classes perform compression routines on byte arrays. Therefore, to compress an object, the first step is to translate it into a representation of bytes, which begins with the Serializable interface.

When an object implements the Serializable interface, it can be represented as a stream of bytes. This byte stream can be written using the ObjectOutputStream.writeObject() method and reconstituted using the ObjectInputStream.readObject() method, allowing for a simple translation of a byte stream to and from an object. This ability to serialize an object, capturing the resulting byte stream into a byte array, provides a usable input for the compression methods available in the java.util.zip classes.

Using this approach, we will accept a serialized object, write the object into a byte array, and then compress the array. The array of compressed bytes, along with a few other key variables, will be stored in a new object, cZipObject, which is shown in its entirety in Listing 1. The cZipObject will encapsulate the compressed version of the input object. The cZipObject can then be serialized to transfer across the network. On the receiving end, the byte array will be extracted from the cZipObject, decompressed, input to a byte stream, and then reconstituted into an object. This process is not truly compressing the object, but compressing the serialized representation of the object and its data.

To easily integrate these compression routines on both the server and client side, we'll create a cZipFactory class that will contain all the methods for compressing and decompressing objects. We'll create a number of methods along the way that can be of direct use, such as a byte compression method. By encapsulating both the compress and decompress functions into a single class, we can add the functionality to both the client and server by creating a single object. This will allow us to compress objects sent from the server to the client as well as from the client back up to the server.

The first step is to convert the Serializable object into a byte array. This can be achieved by using the Object- OutputStream with an underlying ByteArrayOutputStream from the java.io package. First, we'll create a new ByteOutput Stream that will capture the byte stream when the object is written. We'll then create a new ObjectOutputStream, write the serialized object, and then extract a byte array from the ByteOutputStream.

try {
ByteArrayOutputStream byteOut = new ByteArrayOutputStream();
ObjectOutputStream objOut = new ObjectOutputStream(byteOut);
objOut.writeObject(inObj);
byte[] DataArray = byteOut.toByteArray();
} catch (Exception e) {
System.out.println(e.getMessage());
}

With this code, we now have the ability to translate any object that implements the Serializable interface into a byte array capable of compression. The resulting byte array contains the details of the object as well as the object's data. The array contains the essential structural and data attributes to replicate the object and all its content. The next step is to compress the data contained in the byte array, thereby compressing the serialized representation of the object.

There are a few simple steps to compressing byte arrays using the Deflater class from the java.util.zip package. First, we'll create a new array for the compressed bytes. Without a method to accurately predict or estimate the size of the byte array resulting from compression, it's advisable to create an array of equal size to the noncompressed bytes and then shrink the array once the compression is complete and the true size can be determined.

The next step is to create a new instance of the Deflater class, passing in the desired compression level in the constructor. There are a few options for compression level, each with benefits and drawbacks. The best compression option provides the greatest reduction in byte size at the expense of increased processing time. The best speed option provides a good compression level, usually 80% or better, in the shortest possible time. I usually opt for best compression, finding the extra milliseconds in processing time worth the decreased object size. For more information on the available compression levels, refer to the JavaDocs for java.util.zip.Deflater.

Once the Deflater object has been created, call the setInput(byte[]) method providing the byte array we extracted from the object serialization. Invoke the finish() method to inform the Deflater class that all inputs have been defined. Next, call the deflate(byte[]) method, providing the byte array to house the compressed data. When this method completes its execution, the data has been compressed and populated in the output byte array. The getTotalOut() method in the Deflater class will return the total number of bytes that were written in the output byte array. Using the new array size, we'll create a byte array to the exact size of the compressed output. We'll then use the System.arraycopy function to copy the bytes from the temporary array into the exact size array.

For ease of use, we'll encapsulate these steps into a single method named CompressBytes in the cZipFactory object (see Listing 2). Now, when we need to compress a byte array, we can invoke a single method:

byte[] bytesCompress = ZipFactory.CompressBytes(DataArray);

There are two key pieces of data required to quickly and accurately decompress the object: the byte array containing the compressed data and the original size of the serialized byte array. When the byte array is decompressed, it will be written into another byte array. Knowing the size of the decompressed array will not only make the decompression more efficient, it will also ensure accuracy. To save the byte array and original size easily, we will encapsulate them in a new instance of the cZipObject class.

cZipObject cZipObj = new cZipObject();
cZipObj.setData(bytesCompress, iOrigSize);

By combining all these steps, we can now create a method that accepts any Serializable object and returns a cZipObject. This is the Compress method in the cZipFactory class, shown in its entirety in Listing 3. Using the new method in cZipFactory greatly simplifies the integration of object compression functions. First, we create an instance of the cZipFactory class, providing the desired compression level during object creation.

cZipFactory ZipFactory = new
cZipFactory(java.util.zip.Deflater.BEST_COMPRESSION);

Using the new cZipFactory class, we can compress a serializable object using a single line of code:

cZipObject newZObject = ZipFactory.Compress(inObject);

When the client or receiving machine obtains the cZipObject, it needs to be decompressed and reconstituted into an object. To achieve this, we'll create another method in cZipFactory to handle the Decompress operation. This method will extract the byte array from the provided cZipObject, decompress the array, and then translate the bytes into an object. The Decompress method in cZipFactory will return a Serializable object, which can be cast into the original type of object.

Using the java.util.zip.Inflater class, we can easily decompress the byte array in a few lines of code. Given the compressed byte array and the original size of the byte array, the Inflater class can be used to decompress the byte array. As this function could be useful in a variety of situations, we'll create a method in the cZipFactory class named DecompressBytes. The method will accept a byte array containing the compressed bytes and a primitive integer for the size of the decompressed array. At this point, it's very important that we know the original size of the byte array (see Listing 4). Without this information, it wouldn't be possible to accurately predict the total size of the decompressed bytes without extracting the data in a loop. Knowing the original size of the byte array makes the decompression code easier and more efficient.

With the ability to decompress a byte array in place, we then move to the process of converting the bytes back into a usable object using an instance of ObjectInputStream. First, we'll create a ByteArrayInputStream using the decompress byte array. Using the byte stream, we'll construct a new ObjectInputStream to reconstitute the object. By invoking the readObject method, the ObjectInputStream will translate the byte stream into a usable object. To simplify our coding, we'll place this code in a method named ConvertByteToObject in the cZipFactory class (see Listing 5).

The final step is to create a Decompress method in the cZipFactory class that will accept a cZipObject and return a Serializable object. The completed Decompress method is shown in the cZipFactory class in Listing 3.

Using the cZipFactory class, we can now decompress a cZipObject using a single line of code:

Serializable retObject = ZipFactory.Decompress(newZObject);

The Serializable object can then be cast into its original form or in the same line of code as the call to Decompress:

Vector vClientList = (Vector)ZipFactory.Decompress(newZObject);

In the end, the cZipFactory provides easy-to-use methods that translate serializable objects to and from compressed representations of objects. The entire compression API can be quickly implemented in just a few lines of code. Another important feature is the ability to use the function selectively rather than a system-wide change, such as compressing a socket. The resulting cZipObject can be extended or expanded to meet the requirements of an application or can be treated like any other Java object. This also allows for the reuse of a cZipObject, allowing the developer to cache a compressed object, effectively eliminating the need to redundantly perform compressions.

A Simple Client List Example
Now that we've built the classes to compress Serializable objects, we'll work through an example using the new objects. To begin, let's create a vector of client names. For our example, we'll create a vector with generic content, but you could imagine this list of clients being derived from a database call, an XML document, or some other data source.

Vector vClients = new Vector(1000);
for (int i = 0; i < 1000; i++)
vClients.add("Client # " + i);

The resulting vector, vClients, contains 1,000 entries and when serialized is 14,046 bytes. If the client machine connects using a 28.8 modem, they will retrieve this vector at approximately 3.33 KBS. At this throughput rate, it'll take the client machine approximately 4,200 milliseconds to download this list of 1,000 clients. If we wanted to add in compression, we'd add this line of code on the server:

//Using a pre-existing cZipFactory class instance
cZipObject zoClients = ZipFactory.Compress(vClients);

On the client machine, we add this line of code to decompress the cZipObject:

//Using a pre-existing cZipFactory class instance
Vector vClients = (Vector)ZipFactory.Decompress(zoClients);

Using this example, the Compress method executes in approximately 40 milliseconds. We would then transmit the zoClients object to the client machine, which when serialized is 2,296 bytes. At 28.8 modem speed, the cZipObject instance is downloaded to the client in approximately 690 milliseconds. The client then decompresses the cZipObject, casting the contents into a vector. The Decompression operation on the client takes an additional 30 milliseconds. The total time using compression was 40 + 690 + 30 = 760 milliseconds. When compared to the original download time of 4,200 milliseconds, the compression technique saved 3,440 milliseconds of client wait time and reduced the total object size by 11,750 bytes, resulting in 83.6% less bandwidth consumption. This is more than five times faster and is achieved with a few simple lines of code on the server and client.

Listing 6 provides a simple testing class that was used for this example and the benchmarks quoted in this article. By using this simple testing class, you can see that when applied to larger data structures, the compression functions make a more profound impact on bandwidth reduction and client wait times.

Expense of Compression
There are two primary expenses to this compression technique: increased memory usage and CPU cycles. This approach is compressing the serialized representation of an object, which requires that the object be serialized into an array that's then compressed and included in another serializable object. In addition to the increase in memory usage, there will be an increase in CPU utilization. The compression routines are comprised of arithmetic operations, which will result in increased CPU usage during deflation and inflation processing. For larger installations of these compression routines, it would be reasonable to expect notable increases in server CPU usage, which would need to be analyzed in terms of frequency and the size of the objects being compressed. As a benchmark, in one installation the server processed approximately 10,000 compressions an hour on objects ranging from 10K to 350K. The addition of compression functions resulted in approximately a 3% increase in CPU usage.

Another important factor to remember is that the client machines will also have increases in memory usage and CPU utilization to decompress the objects, or compress objects being sent to the server. The speed of these decompression routines will depend on the client machine hardware.

Conclusion
If you are writing distributed Java applications, whether they're EJB systems or custom RMI solutions, the introduction of compression routines can provide tremendous improvements to the response time and bandwidth consumption of your programs. One of the primary advantages to the approach presented here is its simplicity, allowing the developer to continually work with objects and avoid the compression functions. Using the cZipFactory also allows the developer to avoid socket-level operations or the creation of disk files, retaining the structure of existing programs and making it possible to selectively implement the functions. Another benefit of the cZipFactory is the use of standard Java libraries, making the compression function available in both J2SE and J2EE applications.

For our applications, the performance of the compression routines has been excellent, with minimal server impact and network usage down by 85%. Today, of the approximately 3,000 client machines using the compression classes, there have been no reports of problems with CPU utilization or memory usage. Overall, the introduction of compression was the single largest performance improvement made in our five year development effort.

Resources

  • "Compressing and Decompressing Data Using Java APIs": http://developer.java.sun.com/developer/technicalArticles/Programming/compression/
  • Object Serialization in Java: http://java.sun.com/j2se/1.4.2/docs/guide/serialization/
  • Java Documentation for java.util.zip package: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/package-summary.html
  • Java Documentation for java.io package: http://java.sun.com/j2se/1.4.2/docs/api/java/io/package-summary.html

    SIDEBAR
    Calculating the Benefits of Compression

    There are a number of benefits to using serialized object compression, most notably the reduction in the size of the serialized output. The performance gain is directly related to the average object size, the bandwidth of the client connections, and the CPU processing power of the server and client machines. When determining whether to implement a compression function, these factors should be projected in order to ensure a positive gain. Consider this simple equation to determine if compression routines would be beneficial:

    [(Object Size bytes) × 8] ÷ [Line Speed kbs] = Avg. Download Time (ms)

    [10000 × 8] ÷ 128 = 625 ms

    Now, reduce the average object size by 80% and recalculate the download time; this time add an additional 100 milliseconds for processing time.

    [(Object Size bytes × 8 × 0.2)] ÷ [Line Speed kbs] + 100 = Compress Download Time (ms)

    {[(10000 × 8) × 0.2] ÷ 128 } + 100 = 225 ms

    In the chart in Figure 1 we see how the slight increase in processing time required for compression can create tremendous gains in download time.

    Regardless of the bandwidth from the server to the client, compression routines will have a definitive impact on network usage (see Figure 2).

    It's important to remember that at some point the law of diminishing returns becomes prevalent. For example, if the average size of the object before compression is 5,000 bytes, then compression could reduce this to as little as 1,000 bytes. The total expense of this compression would be about 100 milliseconds. If the client machines were on 28.8 modems, the compression would have a positive impact, reducing client wait time by about 1,100 milliseconds. However, if the client machines were on 512K connections, downloading the original 5,000 bytes would only take about 90 milliseconds. Even though the 1,000 bytes would take 17 milliseconds, we have now added additional processing time for the compress and decompress operations, potentially creating a negative return, and not significantly impacting download time.

    The chart in Figure 3 helps to illustrate how the benefits of compression on client wait time can be quickly reduced in higher bandwidth environments. It's important to note that while client wait time may not be significantly reduced by compression, network traffic will always be reduced. Even though the end user may not notice improvements, the network will always benefit from the reduction in throughput.

  • More Stories By Robert Beckett

    Robert Beckett is the Chief Architect for The Software Development Cooperative. Robert is currently working on two Java products: an extensive API set for Java developers www.thesdc.com\basesys\ and a high-performance Java RMI server www.thesdc.com\symtier\.

    Comments (2) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    Robert Beckett 11/08/03 04:02:05 PM EST

    I would like to thank all of the developers for their supportive, positive, and constructive feedback. A number of developers have been using the zip classes from the article and are having great success; however, a few errors have been flushed out, namely with trying to zip small or empty objects. There are also some of us working on enhancing the cZipFactory to use streams, such as those from the java.util.zip package. We have created a web page with these corrections and ongoing updates. If anyone is having problems or has update suggestions, please check out the page at http://www.thesdc.com/basesys/zipzap_updates.html or e-mail me at [email protected]. Again, thanks for the kind words and great coding contributions!

    10/24/03 03:34:05 PM EDT

    I've been disappointed of late with the JDJ content (less technical, more fluff). However this article brings back memories of past technical, how-to JDJ articles that were the norm rather than the exception. Kudos to Robert Beckett for his excellent article and utility. Let's hope this article starts a trend at JDJ!

    @ThingsExpo Stories
    There is an ever-growing explosion of new devices that are connected to the Internet using “cloud” solutions. This rapid growth is creating a massive new demand for efficient access to data. And it’s not just about connecting to that data anymore. This new demand is bringing new issues and challenges and it is important for companies to scale for the coming growth. And with that scaling comes the need for greater security, gathering and data analysis, storage, connectivity and, of course, the...
    The IETF draft standard for M2M certificates is a security solution specifically designed for the demanding needs of IoT/M2M applications. In his session at @ThingsExpo, Brian Romansky, VP of Strategic Technology at TrustPoint Innovation, will explain how M2M certificates can efficiently enable confidentiality, integrity, and authenticity on highly constrained devices.
    trust and privacy in their ecosystem. Assurance and protection of device identity, secure data encryption and authentication are the key security challenges organizations are trying to address when integrating IoT devices. This holds true for IoT applications in a wide range of industries, for example, healthcare, consumer devices, and manufacturing. In his session at @ThingsExpo, Lancen LaChance, vice president of product management, IoT solutions at GlobalSign, will teach IoT developers how t...
    When it comes to IoT in the enterprise, namely the commercial building and hospitality markets, a benefit not getting the attention it deserves is energy efficiency, and IoT's direct impact on a cleaner, greener environment when installed in smart buildings. Until now clean technology was offered piecemeal and led with point solutions that require significant systems integration to orchestrate and deploy. There didn't exist a 'top down' approach that can manage and monitor the way a Smart Buildi...
    So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, will provide tips on how to be successful in large scale machine lear...
    SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus inter...
    Digital payments using wearable devices such as smart watches, fitness trackers, and payment wristbands are an increasing area of focus for industry participants, and consumer acceptance from early trials and deployments has encouraged some of the biggest names in technology and banking to continue their push to drive growth in this nascent market. Wearable payment systems may utilize near field communication (NFC), radio frequency identification (RFID), or quick response (QR) codes and barcodes...
    You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
    SYS-CON Events announced today that Ericsson has been named “Gold Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. Ericsson is a world leader in the rapidly changing environment of communications technology – providing equipment, software and services to enable transformation through mobility. Some 40 percent of global mobile traffic runs through networks we have supplied. More than 1 billion subscribers around the world re...
    We're entering the post-smartphone era, where wearable gadgets from watches and fitness bands to glasses and health aids will power the next technological revolution. With mass adoption of wearable devices comes a new data ecosystem that must be protected. Wearables open new pathways that facilitate the tracking, sharing and storing of consumers’ personal health, location and daily activity data. Consumers have some idea of the data these devices capture, but most don’t realize how revealing and...
    The demand for organizations to expand their infrastructure to multiple IT environments like the cloud, on-premise, mobile, bring your own device (BYOD) and the Internet of Things (IoT) continues to grow. As this hybrid infrastructure increases, the challenge to monitor the security of these systems increases in volume and complexity. In his session at 18th Cloud Expo, Stephen Coty, Chief Security Evangelist at Alert Logic, will show how properly configured and managed security architecture can...
    The IoTs will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm and share the must-have mindsets for removing complexity from the development proc...
    Artificial Intelligence has the potential to massively disrupt IoT. In his session at 18th Cloud Expo, AJ Abdallat, CEO of Beyond AI, will discuss what the five main drivers are in Artificial Intelligence that could shape the future of the Internet of Things. AJ Abdallat is CEO of Beyond AI. He has over 20 years of management experience in the fields of artificial intelligence, sensors, instruments, devices and software for telecommunications, life sciences, environmental monitoring, process...
    In his session at @ThingsExpo, Chris Klein, CEO and Co-founder of Rachio, will discuss next generation communities that are using IoT to create more sustainable, intelligent communities. One example is Sterling Ranch, a 10,000 home development that – with the help of Siemens – will integrate IoT technology into the community to provide residents with energy and water savings as well as intelligent security. Everything from stop lights to sprinkler systems to building infrastructures will run ef...
    We’ve worked with dozens of early adopters across numerous industries and will debunk common misperceptions, which starts with understanding that many of the connected products we’ll use over the next 5 years are already products, they’re just not yet connected. With an IoT product, time-in-market provides much more essential feedback than ever before. Innovation comes from what you do with the data that the connected product provides in order to enhance the customer experience and optimize busi...
    Manufacturers are embracing the Industrial Internet the same way consumers are leveraging Fitbits – to improve overall health and wellness. Both can provide consistent measurement, visibility, and suggest performance improvements customized to help reach goals. Fitbit users can view real-time data and make adjustments to increase their activity. In his session at @ThingsExpo, Mark Bernardo Professional Services Leader, Americas, at GE Digital, will discuss how leveraging the Industrial Interne...
    The increasing popularity of the Internet of Things necessitates that our physical and cognitive relationship with wearable technology will change rapidly in the near future. This advent means logging has become a thing of the past. Before, it was on us to track our own data, but now that data is automatically available. What does this mean for mHealth and the "connected" body? In her session at @ThingsExpo, Lisa Calkins, CEO and co-founder of Amadeus Consulting, will discuss the impact of wea...
    Increasing IoT connectivity is forcing enterprises to find elegant solutions to organize and visualize all incoming data from these connected devices with re-configurable dashboard widgets to effectively allow rapid decision-making for everything from immediate actions in tactical situations to strategic analysis and reporting. In his session at 18th Cloud Expo, Shikhir Singh, Senior Developer Relations Manager at Sencha, will discuss how to create HTML5 dashboards that interact with IoT devic...
    Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
    A critical component of any IoT project is the back-end systems that capture data from remote IoT devices and structure it in a way to answer useful questions. Traditional data warehouse and analytical systems are mature technologies that can be used to handle large data sets, but they are not well suited to many IoT-scale products and the need for real-time insights. At Fuze, we have developed a backend platform as part of our mobility-oriented cloud service that uses Big Data-based approache...