| By Tim Middleton | Article Rating: |
|
| February 21, 2008 12:00 PM EST | Reads: |
9,190 |
Caching Topologies
Depending on data usage
patterns such as data volatility, frequency of update, and expiration
requirements, many different topologies or configurations must be
available for use. For example, for relatively small volumes of
read-only or rarely updated data, a brute force "replicate everywhere"
topology may work. In contrast, large amounts of volatile data (which
may grow) may require a topology that will dynamically spread the load
over the members in the cluster and repartition when new members are
added. A combination of these topologies could also be used, which
would provide the benefits of both in-memory access and the ability to
grow and load balance the data across the cluster.
The key here is that the developer shouldn't have to code the clustering, replication, data backup, or parallel processing logic required to support the different topology types. The developer should code to a standard API and concentrate on writing business logic. The configuration underneath should be able to be changed declaratively via configuration, without any changes to the APIs that have been written.
Data Source Integration
When using a mid-tier data
grid there are a number of usage patterns for data. Some data will be
populated directly from the applications themselves. However, for
applications that require data to be cached, there should be a
consistent way of loading data from back-end data sources in the case
of a cache-miss - that is, when the data being queried isn't available
in the grid but does exist in a back-end data store. The developer
shouldn't have to write code to deal with it.
Vendors with robust solutions in this space frequently implement them using approaches that let the data source plug transparently into the grid. For example, in the case of Oracle Coherence, loading directly from the database is done declaratively by attaching a CacheStore interface to the deployment configuration. Developers can either implement to a standard interface that calls to the back-end data store for query and update or use out-of-the-box integration with persistence solutions such as JBoss Hibernate or Oracle TopLink.
When either of these methods is used, if the data doesn't exist in the data grid, the solution will automa- tically delegate the data request to the CacheStore implementation, which then retrieves it from the back-end stores.
The capability of refreshing the data objects in the data grid based on time-triggered or other data expiry mechanisms is especially useful for those who use the data grid as a system of record and the official place for accessing data. Having a formal mechanism such as this built into the solution enables expiry policies and other data eviction policies to be matched by the infrastructure, which refreshes the data grid based on policies defined by an administrator. Ideally the solution shouldn't require customers to poll their back-end system for changes in data or scheduling jobs to refresh the data grid — these solutions are simply not scalable or manageable.
Sending the Processing to the Data
The advantage
of using a distributed cache or data grid topology is that processing
as well as data can be scaled when adding more resources to the grid.
In a traditional use case in which we need to read data and do
processing on it within a Map (for example, giving a raise to
employees), we may have used something similar to the following
(ignoring error handling, etc.):
Iterator<Employee> iter = map.values().iterator();
for (Employee emp : iter) {
emp.setSalary(e.getSalary() * 1.1);
}
This (which could be written dozens of ways, of course) would achieve the desired result, but in this example, if the Map wasn't local to the Java process or distributed on another server, there would be a lot of network traffic to and from the client. The process would be serialized (that is, one entry processed at a time) and to rewrite this to run in parallel over multiple JVMs, taking into consideration the co-ordination of the concurrent processing, would require a considerable amount of work.
Taking advantage of grid processing and the ability of the data caching topology to load balance and partition data across multiple servers, it makes sense to send the processing to where the data is, rather than bringing the data to the client for processing. A common approach (this example is specific to Oracle Coherence) is to deploy code in the grid that performs the logic local to the nodes in the grid, rather than requiring the programmer to bring the all the data to the client.
The example shows how this approach could be used to raise the salary of all employees. First create a class to process the data:
public class RaiseSalary extends AbstractProcessor {
public RaiseSalary() {
}
public Object process(Entry entry ) {
Employee emp = (Employee)entry.getValue();
emp.setSalary(emp.getSalary() * 1.10);
entry.setValue(emp);
return null;
}
}
Now invoke this across the Map (data grid):
empCache.invokeAll(AlwaysFilter.INSTANCE, new RaiseSalary());
Sending the processing to the data dramatically improves the performance of tasks such as this because now the compute activity is parallelized across the entire grid.
Figure 1 illustrates the benefits of sending the processing to the data.
With multiple nodes in the grid and data distributed in parallel across the nodes, the processing model would scale well and take advantage of the processing capabilities of each node. Also, the fact that data doesn't need to be shipped back and forth between the client and server significantly increases the scalability and performance of such a system. As outlined in the example, using traditional non-grid methods would result in extremely poor performance and limited scalability.
Published February 21, 2008 Reads 9,190
Copyright © 2008 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Tim Middleton
Tim Middleton is a solution architect with Oracle in Perth, Western Australia. He has over 17 years of experience in the IT industry. During this time he has been involved in the design and implementation of many large and leading-edge technology projects within the government and private sectors. His focus is on providing middleware solutions around SOA, with an emphasis on architectures that are highly available, scalable and reliable. Tim also has extensive development experience with J2EE and application server-based solutions, as well as many years experience as a DBA.
- Kindle 2 vs Nook
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- Confessions of a Ulitzer Addict
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- It's the Java vs. C++ Shootout Revisited!
- Cloud Computing Can Revitalize Your Career as Software Developer
- IBM Could "Reinvent" Java: Mills
- Oracle & Cloud Computing: Exclusive Q&A with SVP Richard Sarwal
- A Brief History of Cloud Computing
- Kindle 2 vs Nook
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- i-Technology Predictions for 2007: Where's It All Headed?


































