Welcome!

Java Authors: Liz McMillan, Walter H. Pinson, III, Maureen O'Gara, Yakov Werde, Tony Bishop

Related Topics: Java

Java: Article

Turbo-Charging Applications with Mid-Tier Distributed Caching

Fast and predictable data access

Reliability Considerations
When performing operations across the cluster, reliability of the architecture is paramount to the success of a project. In particular, if transactional activity is occurring across the cluster, when data is changed in the grid and ultimately persisted to a back-end store, it's essential to ensure that the chosen solution survives failure gracefully.

We recommend that customers build their solution and test failure scenarios with four to six machines in their topology. When considering which data grid solution to use, the following capabilities should be given high importance in the selection criteria:

  • Dynamic scaling of the grid. You should be able to add nodes to it without program or configuration changes. The grid should scale dynamically once the baseline configuration of the application has been set.
  • Reliability of the grid. Basic tests such as unplugging machines from the network and forcefully killing specific nodes in the cluster will ensure that the architecture is robust and can be depended on for mission-critical applications.
  • Throughput performance of the grid. Adding nodes to the grid should give a near-linear and predictable growth in throughput. If the throughput doesn't grow in linear fashion, the solution may not be effective and may not provide the desired performance in a production situation.
A fundamental decision criteria is determining whether a solution works only on a single or preset number of nodes, or worse, requires specific changes to the program code when it scales to larger numbers of nodes.

These technologies aren't only being used in small grids consisting of four to eight nodes; they're increasingly being deployed in large grids of 500 to 1,000 nodes. At these extreme levels predictable and automated reliability, scalability, and performance are critical.

Deployment Considerations
When using and deploying a data grid, there are many things to consider in addition to the programming model and cache topologies that you want to use. Some issues to consider are as follows.

The Network
It's vital to ensure that when data is requested across the network due to client data requests, recovery requests within the grid, or other processing, your environment is optimally tuned and secured at all levels. Some areas to consider include:

  • High network throughput - bandwidth should be a minimum of 1 GBps.
  • Redundant network interface cards (NICs) in each server for availability and performance, as well as physically separate network segments for non-cache traffic. For example, from within the cache you may have back-end data stores that need to be written to or read from. These should use separate NICs and therefore separate data paths so as not to interfere with cache traffic.
  • Optimal configuration of your network and switching devices. A data grid operating in full production mode has the potential to saturate the networking infrastructure.
Operating Systems
Typical development environments usually consist of either Windows or Mac operating systems. Test and production environments tend to be Linux, Solaris, AIX, or Windows. Depending on the operating systems you use, some of the issues to consider are:
  • Tune the TCP/UDP layers to be optimal for the operating system you're using.
  • Ensure that your NICs are working properly in full duplex mode.
  • Make sure that your grid server processes are never paged to disk since this will severely impact performance.
  • Ensure that you test your application thoroughly on the target platform so that any issues related to the subtle differences in operating systems or JVM versions/implementations are aired.
JVMs
JVM configurations and command line parameters vary slightly between the different vendor implementations. The following considerations are a good place to start:
  • Set the -Xms option and the -Xmx option to the same value to ensure that you're not allocating too much memory to the JVMs. This should help keep garbage collection pauses manageable.
  • Use the -server option to get better performance.
  • Be aware of the capabilities and switches available in your JVM to achieve optimal JVM performance and GC optimization.
Hardware Choices
Consider taking advantage of commodity-based, dual/quad-core x86 or x86-64 hardware. Solutions built on these platforms are very cost-effective and designed to scale efficiently.

Security
Ensure that you've considered encrypting data in transit and securing the data grid so that only authorized processes can access and manipulate data in the grid.

Conclusion
Keeping data cached in object form in a mid-tier data grid provides fast and predictable access to it and provides a scalable and reliable platform for supporting extreme transaction processing from Java, C#/.NET, and C++.

Data grids provide multiple caching topologies to support various data access requirements including static reference data and massive volumes of volatile data as well as integration with back-end data sources using technologies such as Hibernate and Toplink. By using data grid technology with commodity-based hardware, you can linearly scale your data and processing and provide predictable and reliable access to your data.

With many vendors providing reliable and scalable data grid solutions, engineers can spend their valuable time designing and writing code to solve business problems rather than building caching and data grid infrastructures from scratch.

More Stories By Tim Middleton

Tim Middleton is a solution architect with Oracle in Perth, Western Australia. He has over 17 years of experience in the IT industry. During this time he has been involved in the design and implementation of many large and leading-edge technology projects within the government and private sectors. His focus is on providing middleware solutions around SOA, with an emphasis on architectures that are highly available, scalable and reliable. Tim also has extensive development experience with J2EE and application server-based solutions, as well as many years experience as a DBA.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.