Welcome!

Java Authors: Don MacVittie, Maureen O'Gara, Liz McMillan, Walter H. Pinson, III, Yakov Werde

Related Topics: Java

Java: Article

Top Cluster Considerations

Design your application to work in a cluster and be sure to test it carefully to avoid cluster gotchas

Maximize Local Object Accesses Within Tiers
When an application is running as a single instance all accesses are local. Moving to a clustered environment means, however, that accesses to EJBs, Web Services, and other "remotable" components may no longer be local. Remote object invocations are expensive compared to local calls, requiring request and return object serialization and de-serialization.

Care should be taken to ensure as much local object access as possible:

  1. Deploy all application components in the same tier, when feasible. For example, Web and EJB components in the same tier will ensure local access to EJBs when feasible. Sometimes, security or other requirements preclude this.
  2. Use the EJB "local" APIs to avoid a request and response object pass by value semantics (requiring serialization and de-serialization).
  3. Design coarse grain APIs between components to reduce application "chattiness", but at the same time ensure that only the minimum required data is sent and retrieved between components.
Leverage Messaging to Enable Background and/or Batch Processing
Performance is often the requirement that motivates moving from a single instance to a clustered environment. To further improve performance and reduce the load on contentious resources the application can be modified to perform some work asynchronously. For example, an order processing application may choose to process inventory updates in the background. When a new order is being processed, updating the inventory data during the order transaction isn't essential. Processing the inventory data synchronously actually increases the response time for the user and further burdens the database and other resources in the transaction.

In J2EE, JMS can be used in a transaction to create a durable and reliable message for background processing. MDBs are usually used for message processing and provide a high-performing mechanism for asynchronous updates. Often MDBs allow work to pile up and do updates and other processing in "batch" mode (many updates at once) to improve performance further. At times this can even be done when there's a reduced load on the application.

Design and Partition Caches for Cluster Awareness
Another important area to consider for clustered applications is caching. Caching heavily used data can significantly improve response time, as well as reduce processing time and the load on resources such as databases. Most application servers provide caching capabilities in various flavors such as EJB "read-only beans," Java object caching, Web Services caching, servlet/JSP/JSP tag caching, etc.

Some application servers support these caching facilities across a cluster. This requires complex mechanisms that provide key features such as invalidating cached data across systems when it's updated, sharing cached data across a cluster for data that's very expensive to retrieve, etc. Using the capabilities provided by your application server is the preferred approach. Leveraging these facilities can greatly improve performance and reduce interruption in processing due to failovers.

In addition, there are special considerations the designer must be aware of when leveraging application caching in a cluster. As data is updated and invalidated in one cache, there are policies governing when and how often the data is updated across the cluster of caches. Common examples include EOS (end of service) updates and TBW (time-based writes). With an EOS policy, the data is synchronized at the end of a request. With TBW multiple updates are synchronized in a batch mode at certain time intervals. This of course leads to potentially longer time frames where the data isn't synchronized. TBW provides a higher-performance alternative for application data, such as a product catalog that doesn't have high integrity requirements and isn't updated very frequently. Customer or Order data, on the other hand, would generally require a more timely approach such as EOS. So the application design must take into account the caching policies to achieve the desired behavior and performance.

Data partitioning mechanisms are provided by some application servers such as WebSphere Extended Deployment (XD). These facilities can be used to partition data access from HTTP, EJBs, JDBC, etc. to dedicated systems in the cluster. For example, in the order processing example given above, the inventory database can be partitioned and accessed by a dedicated set of EJBs. This type of partitioning can significantly reduce data concurrency and locking while providing extended options and performance for caching. Be aware that leveraging WebSphere XD data partitioning will require changes to your application.

Consider Cluster Impacts on Shared Resources
At deployment time, you have to make sure that the shared resources are configured appropriately to handle the additional load. For example, if you've configured a single application server instance to have 20 pooled connections to the back-end database for optimal performance, this number will be multiplied by each server in the cluster. Thus a cluster of five application servers would result in 100 pooled connections. This is probably too many. Be aware of this multiplicative effect for all shared resources and pools.

Another potential gotcha in a cluster with multiple physical machines is the server clock time. It's important to synchronize the system clocks to avoid unwanted behavior due to session timeouts on failover, data time-stamping, or other application-specific time processing.

Use Application Server-Provided Workload Management and Replication
Application servers like WebSphere provide capabilities for cluster scalability, including workload management and failover, HTTP session affinity, HTTP session and cache replication services, etc. As we discussed in the sections on HTTP session and caching, these services are both complex to implement and can have a significant impact on overall cluster performance. Our recommendation is to leverage the platform cluster capabilities over trying to roll-your-own solutions.

Additionally, conflicts may arise if you try to roll-your-own workload management and affinity techniques and then combine them with your application server capabilities. Except in highly unusual circumstances, use the platform capabilities.

TEST, TEST, TEST
To run a cluster successfully, an application needs to be tested in a clustered environment. This will help expose many of the subtle problems that can arise such as the ones discussed above. It's also important to test edge cases such as simultaneous logins from the same user, machine, or application server. Additionally, you'll want to test failover scenarios as described by David Purcell in "Moving to a Cluster."2

Summary
When designing your application, there is a set of best practices that should be followed so your application works and performs when moved to a cluster environment. Your application server platform may provide powerful capabilities such as workload management and session replication but if your application session data isn't serializable, or your application writes configuration data to the local file system, your application is going to fail in a cluster. So, design your application to work in a cluster and be sure to test it carefully to avoid the cluster gotchas.

Acknowledgements
The authors would like to thank Andrew Spyker, Tom Alcott, Matt Hogstrom, Ken Hygh, and Tony Tuel for sharing their first-hand experiences in how to design applications to work in a clustered environment.

References
1  Improving HttpSession Performance with Smart Serialization, Kyle Brown and Keys Botzum, IBM developerWorks, 11/23/03;
www-128.ibm.com/developerworks/websphere/library/ bestpractices/httpsession_performance_serialization.html

2  Moving to a Cluster, David Purcell, Java Developers Journal, December 2004,
http://jdj.sys-con.com/read/47354.htm

More Stories By Ruth Willenborg

Ruth Willenborg is a senior technical staff member in IBM's WebSphere Technology Institute working on virtualization. Prior to this assignment, Ruth was manager of the WebSphere performance team responsible for WebSphere Application Server performance analysis, performance benchmarking, and performance tool development. Ruth has over 20 years of experience in software development at IBM. She is co-author of Performance Analysis for Java Web Sites (Addison-Wesley, 2002).

More Stories By J. Stan Cox

J. Stan Cox is a senior engineer with IBM's WebSphere Application Server performance group. In this role, he has worked to improve WebSphere application performance for J2EE, Web 2.0, Web services, XML and more. His current focus is WebSphere multicore and parallel foundation performance. Stan holds a B.S.C.S from Appalachian State University (1990) and an MS in computer science from Clemson University (1992).

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
gilinachum 01/22/09 02:36:00 AM EST

My two bits are: Making sure you keep some degree of affinity in the system to combat the inherent cluster complexity.
For example: http session affinity, where a specific user session is always handled by the same specific server instance (as long as no fail-over occurred).

Or, server affinity, imagine a cluster with two presentation web containers (servers: Web1, Web2) talking to a and two business logic servers (Ejb1, Ejb2) cluster, server affinity might mean to configure the Web1 server to hit only on the the Ejb1 server. and have the server Web2 hitting only on the Ejb2 server. although you sacrifice some degree of load balancing and performance (if Web1 is out the game so is Ejb1), the overall layout still highly available (there are two non dependent chains), simpler to grasp by the system operators, and easier to troubleshooting in case of problems.