Java IoT Authors: Elizabeth White, Pat Romanski, Yeshim Deniz, Liz McMillan, Stackify Blog

Related Topics: Java IoT

Java IoT: Article

Distributed Garbage Collection

Distributed Garbage Collection

As any ex-C++ software developer will attest, the Java garbage collector greatly simplifies the task of cleaning up after your objects. With distributed software applications, the garbage collector faces many new challenges since objects may be used by applications running across the Internet. This article looks at some common solutions to garbage collection in CORBA, RMI and DCOM. Finally, the distributed garbage collector in RMI is implemented on top of CORBA.

Back in the old days of software development, programmers had to carefully keep track of all the memory used in a program and clean up each of the unused bits. Failure to properly care for memory could lead to memory actually getting lost somewhere in the ether. In the case of some of the older operating systems, lost memory could only be recovered by rebooting the computer in some circumstances. As a result, software developers had to be intimately familiar with the size of every piece of data used in their applications. Hours were spent tracing through code to determine when data was no longer required and even more hours were spent writing procedures to properly remove the data. A whole niche in the software industry was built on the marketing of development tools to detect lost memory.

Garbage-collected languages such as Java have improved this situation dramatically. No longer do we have to worry about where our memory goes when we are done with it. The garbage collector will find it and clean it up for us. And the Java garbage collector is pretty good at its job. It runs as a low priority thread so you probably will never notice it cleaning up your mess.

However, distributed software has a whole new set of unique challenges. Objects which previously were used only within one program on a single computer can now be used by many different programs running on many different computers. Now that Netscape has built-in CORBA support, it won't be long before you might want to access your objects across the Web.

The garbage collector's job of finding out when an object is still in use just became a whole lot more difficult. The garbage collector used to easily determine which objects were still in use by literally looking at each object in a program, marking those which were still in use and removing the leftovers. But with distributed objects, the whole Internet could be using your objects if you let them. The garbage collector can't very well look at every object on the entire Web to determine which are still in use.

Now add to the equation the realities of the modern Internet. Network links fail all the time. Corrupt packet routing tables bring down whole branches of the Internet temporarily. Machines occasionally crash, both clients and servers. Finally, how many of us have suffered through the occasional 50-bit-per-second connection to read our e-mail?

When a link fails or computer crashes, the distributed garbage collector must be smart enough to do the right thing, whatever that thing may be. Consider if a distributed object is running on a Web server hosted by ACME Web Service, Inc. and the distributed object is currently in use by your Web browser. First, let's say your computer suddenly crashes. In this case, you might want the distributed object to be cleaned up immediately. After all, you probably won't be able to log back into the Internet and just start over where you left off (unless the software was written by a particularly talented developer). But now, let's say that your Internet connection temporarily drops off. This happens often for periods of just a few seconds and you don't even notice it. In this situation, you don't want the garbage collector to go after your object; you'll be back in just a few moments. The distributed garbage collector has to walk a fine line to satisfy everyone.

The CORBA Approach
CORBA uses a combination of reference counting and Internet connection management in order to perform distributed garbage collection. Once a server object has been instantiated, the reference count to it is implicitly incremented whenever a new reference to it is created and implicitly decremented whenever a reference is destroyed. When the reference count reaches zero, the instance of the server object is cleaned up. This is enough for most situations and provides an effective means of distributed garbage collection even in languages such as C++ which don't normally have a garbage collector.

In addition to the implicit rules for reference counting, explicit operations are provided for adding and removing references called duplicate and release. These operations are most useful when manipulating object references through pointers. When a pointer to a reference is copied, the duplicate procedure should be called in order to indicate that a new reference has been created. When a pointer to a reference is destroyed, the release procedure should be called for the opposite reason.

CORBA also carefully manages Internet network connections. When a client is disconnected, either due to a client machine crash or due to a complete network failure, any references held by the client machine are immediately released. This mechanism of detecting a client failure behaves correctly even when a temporary network slowdown causes the server to lose touch with the client. As long as the network connection remains active, the references will not be released and the object will not be garbage-collected.

For the truly adventurous, additional mechanisms are provided to sever communication with an object and immediately cause garbage collection. The deactivate_obj call is an example of such a mechanism.

The RMI Approach
RMI uses a fairly straightforward mechanism for garbage collection. Any program which has a reference to an object must obtain a "lease" for the object. The lease, which is literally represented by a Lease object, entitles the program to use the object for a certain period of time, basically the same idea as leasing office equipment.

If the object continues to be used for an extended period of time, the lease must be renewed before it expires. The renewed lease again entitles the holder to use the object for a certain period of time. If the object is no longer in use, the lease is simply allowed to expire or can be explicitly terminated by the holder at any time. When all the leases have expired or terminated, the object can be garbage-collected.

This design easily solves most of the problems faced by a distributed garbage collector. When an object is no longer in use anywhere on the Internet, no leases are renewed so the object will eventually be garbage-collected. If the computer with an RMI program running on it suddenly crashes, the leases for any distributed objects will simply expire over time and can be garbage-collected.

The Lease object is obtained and renewed using the DGC interface (DGC presumably stands for Distributed Garbage Collector) which is provided by RMI. The main operations on the DGC interface, shown in Listing 1, are dirty and clean. Dirty is for obtaining a Lease object and clean is for terminating Lease objects. However, don't worry too much about learning the details. The software developer should never need to use the DGC interface since it is all taken care of by RMI itself.

The dirty method on the normal DGC interface accepts an array of ObjIDs, a sequence number and a Lease. The array of ObjIDs are object identification numbers for those objects whose lease requires renewal. The sequenceNum is used for nothing more than to guarantee proper network packet ordering since RMI makes use of the unreliable protocol UDP to transmit garbage collection requests. The Lease is just used as a data container to hold a unique identification number for the client making the request and the desired length of the lease. Keep in mind that since the DGC interface is hidden under the covers, RMI itself is choosing the "desired" length of the lease. The software developer has no part in this decision.

One thing you should keep in mind: some network overhead is incurred every time an object renews its lease. A remote request must be sent across the network to the object's host. Thus, if you would like to deploy a system with several hundred clients or several thousand distributed objects, this overhead might become quite considerable. Currently, no means are provided for configuring the leasing period for objects and thus the time between requests to renew a lease, so keep this limitation in mind when architecting your system.

In addition, any distributed architecture which relies on mechanisms like leases is subject to problems when network failures or even slowdowns occur. For example, if you are in the unfortunate situation of having your Internet connection hang just long enough to cause your leases to expire, all of your distributed objects will suddenly be garbage collected even though you are still using them.

The DCOM Approach
DCOM, which stands for Distributed COM, is Microsoft's foray into the world of distributed computing. Since DCOM is supported by the world's second largest software vendor, it deserves at least a brief mention here even though it's unclear how well it will be supported for use with Java. DCOM is totally unlike any other distributed object technology when it comes to garbage collection. First, DCOM differentiates between interfaces and objects. Each has its own type of garbage collection support.

Garbage collection of interfaces is handled through a manual reference counting mechanism. RemAddRef and RemRelease respectively add and release references to remote objects. Both of these calls are sent across the network to the remote system, incurring some network overhead whenever additional references are made. Under the covers, DCOM tries to reduce this overhead by "multiplexing references". This means that a single reference can actually stand for many references within a single program. In addition, programs may optionally request "private references", which are references associated with a particular client identification. Normally, DCOM allows one client to issue more releases than the number of references it currently owns. Private references are a way of preventing this from occurring.

An entirely different mechanism is used for objects. So called keepalive messages are sent periodically to objects as a way of pinging the objects to let them know they are still needed. These keepalive messages are similar in some ways to RMI Leases and have the same weaknesses. A temporary network failure may result in the garbage collection of objects which are still in use simply because keepalive messages were not received in time.

To add a few more variables to the equation, COM implementations may defer the release of references to an interface for an indefinite period of time. The DCOM specification recommends that the remote release of all interfaces be deferred until all local references to all interfaces on an object are released. It's not clear what sort of logic is required by the user, if any, to match up respective interfaces with their objects in order for garbage collection to work as advertised. To top it off, garbage collection of the interfaces is actually left as optional; some COM implementation may never perform it. Confused? Maybe that's what Microsoft intended.

Merging the Approaches
The RMI distributed garbage collector is rather simple and easy to implement using CORBA. First, the DGC interface is hidden from the developer. No means of directly invoking the DGC interface is available. Thus, I feel justified in redesigning the interface slightly in order to simplify the task of implementing it.

In the dirty method, I would like to just pass an object identification number for the object which is to be leased and the identification number for the client requesting the lease. I'll simply return a number indicating the length of time for which the lease was granted rather than a whole object. This method may not be as type safe as returning an object; however, the interface will only be used by our own stub code. That means type safety is not as important as efficiency. I won't return or send a Lease object or allow the desired length of the lease to be configured since this interface is not exposed for the software developer anyway. The sequenceNum which was added just to guarantee a certain amount of packet ordering due to RMI's use of the unreliable UDP network protocol can simply be deleted since CORBA would itself guarantee reliable delivery.

The sequenceNum on the clean method can be deleted for the same reasons. I also broke up the arrays of object identification numbers into a single identification number per method invocation. Although renewing leases in groups may prove useful later, manipulating the lease of one object at a time seems like the most natural way of handling a lease. The modified DGC interface is shown in Listing 2.

To implement the DGC interface, I added a class called DGCImpl whose main responsibilities are to keep track of the leases and periodically clean up those objects which no longer have any outstanding leases. This was accomplished by making the DGCImpl implement runnable so that it would have its own thread to periodically check its leases. When an object no longer has any outstanding leases, the CORBA deactivate_obj is called to immediately remove the object and allow it to be garbage-collected. The full implementation of this is too long to reproduce here due to space considerations but is available for download at my Web site, mentioned at the end of this article.

Two numbers are passed into the DGC interface, the object identification and client identification number. In RMI, these identification numbers are generated by the ObjID and VMID classes respectively. For implementing the DGC on CORBA, I continue this tradition but simplify it slightly by extracting the integer contained in both objects.

The usage of this DGC interface with CORBA is identical to its usage with RMI. When a new reference is created, a lease for the object should be obtained in order to prevent the object from being garbage collected. By performing this action in the client stubs, this can be entirely hidden from the software developer so they don't need to worry about it.

Garbage collection has greatly improved the way in which we write software, but garbage collection in distributed applications has many difficult problems to solve. We've looked briefly at how the three main distributed object systems tackle this problem and demonstrated that two of them aren't quite as different as you might expect at first glance.

Where To Go From Here
RMI and Java can be found at http://www.javasoft.com
CORBA standards can be found at http://www.omg.org
Visigenic, the makers of VisiBroker for Java, can be found at http://www.visigenic.com
More information on distributed GC may be found at http://www-sor.inria.fr

More Stories By Jeff Nelson

Jeff Nelson is a distributed systems architect with DiaLogos Incorporated, experts in CORBA and Java Technologies (http://dialogosweb.com) and active participants in the Object Management Group. He has 8 years of experience in distributed computing and object technology. Jeff can be found on the Web at http://www.distributedobjects.com/

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@ThingsExpo Stories
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Major trends and emerging technologies – from virtual reality and IoT, to Big Data and algorithms – are helping organizations innovate in the digital era. However, to create real business value, IT must think beyond the ‘what’ of digital transformation to the ‘how’ to harness emerging trends, innovation and disruption. Architecture is the key that underpins and ties all these efforts together. In the digital age, it’s important to invest in architecture, extend the enterprise footprint to the cl...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, discussed the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
IoT is rapidly becoming mainstream as more and more investments are made into the platforms and technology. As this movement continues to expand and gain momentum it creates a massive wall of noise that can be difficult to sift through. Unfortunately, this inevitably makes IoT less approachable for people to get started with and can hamper efforts to integrate this key technology into your own portfolio. There are so many connected products already in place today with many hundreds more on the h...
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
Announcing Poland #DigitalTransformation Pavilion
Digital Transformation is much more than a buzzword. The radical shift to digital mechanisms for almost every process is evident across all industries and verticals. This is often especially true in financial services, where the legacy environment is many times unable to keep up with the rapidly shifting demands of the consumer. The constant pressure to provide complete, omnichannel delivery of customer-facing solutions to meet both regulatory and customer demands is putting enormous pressure on...
CloudEXPO | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
The best way to leverage your CloudEXPO | DXWorldEXPO presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering CloudEXPO | DXWorldEXPO will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at CloudEXPO. Product announcements during our show provide your company with the most reach through our targeted audienc...
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
Everything run by electricity will eventually be connected to the Internet. Get ahead of the Internet of Things revolution. In his session at @ThingsExpo, Akvelon expert and IoT industry leader Sergey Grebnov provided an educational dive into the world of managing your home, workplace and all the devices they contain with the power of machine-based AI and intelligent Bot services for a completely streamlined experience.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
22nd International Cloud Expo, taking place June 5-7, 2018, at the Javits Center in New York City, NY, and co-located with the 1st DXWorld Expo will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud ...
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
JETRO showcased Japan Digital Transformation Pavilion at SYS-CON's 21st International Cloud Expo® at the Santa Clara Convention Center in Santa Clara, CA. The Japan External Trade Organization (JETRO) is a non-profit organization that provides business support services to companies expanding to Japan. With the support of JETRO's dedicated staff, clients can incorporate their business; receive visa, immigration, and HR support; find dedicated office space; identify local government subsidies; get...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
In past @ThingsExpo presentations, Joseph di Paolantonio has explored how various Internet of Things (IoT) and data management and analytics (DMA) solution spaces will come together as sensor analytics ecosystems. This year, in his session at @ThingsExpo, Joseph di Paolantonio from DataArchon, added the numerous Transportation areas, from autonomous vehicles to “Uber for containers.” While IoT data in any one area of Transportation will have a huge impact in that area, combining sensor analytic...