Java IoT Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Elizabeth White, Greg O'Connor

Related Topics: Java IoT, IoT User Interface, Recurring Revenue

Java IoT: Article

Java Persistence on the Grid: Approaches to Integration

JPA - the enterprise standard for accessing relational data in Java

Oracle on Ulitzer

The Java Persistence API (JPA) is the enterprise standard for accessing relational data in Java. JPA provides support for mapping Java objects to a database schema and includes a simple programming API and expressive query language for retrieving mapped entities from a database and writing back changes made to these entities. JPA offers developers productivity gains over writing and maintaining their own mapping code allowing a single API regardless of the platform, application server, or persistence provider implementation. Besides the productivity gains the leading implementations offer developers valuable performance and scalability benefits through the inclusion of caching solutions. These caching solutions allow frequently accessed entities to be cached which reduces the number of queries going to the database and the amount of processing time spent converting database query results into objects. Caching can have a significant positive effect on application performance.

JPA and Data Grids
A data grid is software that runs on a cluster of typically low-cost hardware to provide data storage and processing services. Data grid products aggregate the processing power and storage capacity of cluster servers and make it available to clients through APIs designed to shield them from the complexity of distributed computing. Data grids are commonly used as scalable distributed caches; however, distributed data processing is also a common feature. As a cache, a data grid provides a way to exceed the heap size of a single server by distributing data across all cluster servers.

The relevance of data grids to today's enterprise applications is huge, yet their usage is still limited to technology specialists. Data grids are becoming mainstream and developers should consider grid architectures when developing applications and be aware that an application might be expected to scale up to a grid in the future.

Consider a banking system that processes incoming deposit and withdrawal requests by validating all fields before writing them to the database. Validation might include whether the account is valid, whether the requester is the account owner, whether the account contains sufficient funds for the request, etc. You can imagine there are many other validations that could be performed in such a system. The amount of data you have to read from the database to perform the validation of a single request can be significant and result in a large number of queries. Fortunately building such a database-centric application in JPA is straightforward. You map each of the classes in your domain to the database and write the necessary JP QL queries to retrieve the objects required for validation. The system may have to read large amounts of data from the database to process each request, but it works.

Now if we want to dramatically increase the throughput of this system we'd have to address its single greatest bottleneck: querying the database for validation data. Most JPA implementations either provide an L2 cache or support the integration of third-party L2 caches. But if we have to handle very large numbers of requests that arrive in a random order it's unlikely we'll have the required reference data in cache. Caches are useful when you're repeatedly accessing the same data. If your access pattern is random then it's unlikely your cache will contain what you need when you need it. Of course you can always increase your cache size to better your odds of a hit, but each server only has so much heap.

Data grids provide a way to exceed the heap size of a single server and distribute your cached objects over a cluster of servers. The challenge is to integrate data grid technology with JPA to increase throughput without requiring complete application rewrites. Of course as is typically the case with software systems, there's more then one approach to integration, each with its advantages and disadvantages. Let's look at different integration architectures and how we could use them.

Data Grid as Middle Tier Object Cache
As we mentioned, data grid products let you spread your cache across a cluster and can be used as a shared middle tier cache (see Figure 1). They provide a single logical heap that's physically spread over multiple servers with a total storage capacity that's the sum of the heaps of all the cluster servers. In the example, this would mean that by adding more servers to the grid its storage capacity could be increased to the point where all data required for validation could be pre-loaded (commonly referred to as "warming" the cache). Since validation data access is our bottleneck, caching all the required data effectively eliminates it.

For example, consider a simple validation method in our banking system:

public boolean isValidAccount(Request request) {
Account account = entityManager.find(
Account.class, request.getAccountId());
if (account == null) {
return false;
} else {
return account.isValid();

With the data grid integrated as the L2 cache, the find() will check the grid for the desired Account. If not found, it can then proceed to query the underlying database. However, if the grid is warmed with all the Accounts then there will be no need to query the database. Warming the appropriate caches can eliminate database access from the validation process entirely.

Primary key finds are easily directed to the data grid but what about JP QL queries? Consider this method, which finds the Customer associated with a request using a non-primary key query:

public Customer getTxCustomer(Request request)
throws NoResultException {
Customer customer = entityManager
.createQuery("select c from Customer c
where c.masterAccountId = :id")
.setParameter("id", request.getMasterAccountId())
return customer;

Querying the data grid for an object that matches an arbitrary criterion is problematic. First it requires that the data grid provides some sort of query framework and second that the JPA/data grid integration can translate from JP QL into this framework. If both requirements are satisfied then it's possible that the query in our example could be directed to the grid and not the database.

One of the most valuable features of this approach is the possibility of parallel query execution. It stands to reason that the query in our example could be executed in parallel on all the servers in the grid to find the desired object. However, a query that returns many objects is much more interesting. Each grid server could execute a query in parallel to identify those objects it holds that match a given criteria. Performing such a query 10 times in parallel on 10,000 objects is going to be much faster than one time on 100,000 objects. The more servers the smaller the number of objects on each server and the faster the query executes!

Unfortunately there's one complication with queries that return multiple results. Unlike a primary key find() in which a cache miss could automatically result in a database query, it's not clear whether the results obtained from the grid are sufficient. Perhaps only half of the objects you're looking for are in the grid so a grid query wouldn't return the other half from the database. Warming the cache solves this problem by ensuring all objects are in the grid but that isn't always possible. However, for a given use case, you may know whether a particular query should be directed to the grid or to the database. The way you effect query execution in JPA is through query hints. Perhaps something like:

Customer customer = entityManager
.createQuery("select c from Customer c
where c.masterAccountId = :id")
.setParameter("id", request.getMasterAccountId())
.setHint("my-jpa-implementation.dont-query-grid", true)

Of course there's no standard JPA hint for whether to direct a query to a data grid or not. This means that you'd have to introduce implementation-specific hints into your code. Fortunately the JPA specification requires that implementations ignore hints they don't understand so your code isn't tightly coupled to any particular one through hints.

Updating Objects
Naturally querying is the first thing you think of when looking at JPA on the grid but we also have to consider updates: persisting new objects, modifying existing objects, and deleting objects. When the grid is the L2 cache, it's important to ensure that the grid is only updated after a database transaction has successfully committed. Persisting a new object will result in a database INSERT and the new object will be placed into the grid. Modifying an object will result in a database UPDATE and the updated object being placed into the grid. And finally deleting an object will result in a database DELETE and the object being removed from the grid. The key thing is to update the data grid once the database transaction successfully commits.

Data Grid as System of Record
When JPA uses the data grid as a distributed cache, the database is the "system of record." It's the ultimate source of truth and is kept up-to-date at all times. But what if the data grid were the system of record? This is often the case in many financial applications dealing with rapidly changing and transient data. What would JPA on the grid look like if there were no database or if the database were used more as a data archive or warehouse than as an online system? (see Figure 2)

In this architecture, all JPA operations that would normally have resulted in SQL directed to the database are directed instead to the data grid. This includes all queries and all updates. Essentially we replace the database entirely with the data grid. With JP QL translation support we can continue to use JPA as our programming API while working with data stored exclusively in the middle tier. For systems that don't need long-term persistent storage this is ideal. And if more storage or query performance is required you simply add servers to the grid.

Database-backed Data Grid
Even with all queries and updates being performed against the data grid, it's still possible to integrate a database for persistent storage. In this architecture, the grid is responsible for propagating the operations performed on the grid to the database. For example, putting an object into the grid would result in a database INSERT. The advantage of this configuration is that data continues to be highly available but updates are communicated back to the database for permanent storage, reporting purposes, etc. Ideally a grid operation wouldn't be propagated to the database synchronously since that would dramatically reduce throughput. Asynchronous writes of updates to a backing database keeps the grid responsive and yet still supports persistence storage requirements (see Figure 3).

Mix and Match - Heterogeneous Configuration
So far we've looked at a data grid as a cache for JPA and using JPA as a standard API on top of a data grid. The difference in the two architectures is actually fairly minor. For new objects, it boils down to configuring whether or not JPA writes first to the database and then to the data grid or whether it just writes to the grid. The same logic applies to update and delete operations. Querying, as we've seen, is also similar.

If we can configure how to read/write/query on an Entity-by-Entity basis we can mix architectures. Consider a stock trading application. In such an application you've got "enduring" Entities like Companies, Stocks, and Bonds. But you've also got transient Entities like Bids and Asks. An Entity-level configuration would enable JPA to use the data grid as a cache for persistent Entities, like Company, and the data grid as the system of record for transient Entities like Bids.

Scaling JPA with a Data Grid
Hopefully it's clear by now that integrating JPA with data grids is possible and that they can increase system throughput by providing fast access to data managed in the middle tier. But they also offer significantly better scalability for JPA applications as compared with commonly used approaches.

Traditionally, scaling up a JPA application is done by increasing the number of servers in the application cluster and using a load balancer to distribute the work evenly. But as you increase the cluster size you are limited in what you can cache without introducing inter-process messaging and locking. Updates to shared data must be communicated to all cluster servers to ensure no JPA caches contain stale data. For a cluster with N servers this means each update will require N-1 messages. As you increase the number of servers in the cluster the cost of processing a single concurrent update per server increases quadratically according to (N-1)² because each server must message every other server for every update. Worse still, as the cluster grows each server will have to spend a significant amount of its available processing time dealing with incoming update messages. These non-linear communication and update processing costs means that while traditional approaches to clustering JPA applications that employ caching do work well, they are limited to small-to-medium-size clusters.

A data grid solves this communication problem by having only one shared copy of an object accessible from all servers. An update doesn't require messaging to all servers because they'll each pick up the change next time (if ever) they need the updated object. In a data grid with a scalable peer-to-peer communication architecture (i.e., one without a central message routing bottleneck) an update requires communicating to the server that stores the object and to the server(s) that stores a backup copy in case of failover. In this case, the communication cost for processing a single concurrent update per server is described by the linear function C(N) where C is a constant reflecting the number of copies (primary and backups). This linear update cost means that it's possible to scale JPA application using a data grid to large clusters and achieve much higher throughput than would typically be possible.

Of course JPA on the grid isn't without its challenges. The first thing developers familiar with object relational mapping and JPA will undoubtedly be thinking about is cache staleness. This is the most common problem caches introduce. Staleness has two sources: third-party updates to the database, and updates performed by JPA applications running on other servers in the cluster. Dealing with third-party updates is no different with a data grid than it is with any other cache. Most JPA implementations offer a range of techniques to deal with this, including eviction policies, query refresh options, and for extremely volatile data, the ability to disable caching. This is well-worn territory that data grids don't particularly complicate.

As discussed earlier, staleness due to updates made in other cluster servers is traditionally solved by messaging although it has its limitations. In high-transaction-rate systems where the messaging overhead is significant, JPA applications tend to minimize their cache usage and rely on the database to ensure they have the most recent version of the data. Ironically, as transaction rates increase and the value of caching increases it's often disabled because the cost of maintaining cache coherence is too expensive. The use of a data grid to virtually eliminate the messaging and update processing overhead means that high-transaction-rate systems can take advantage of caching to achieve even higher throughput without having to manage staleness.

Querying is another challenge. JPA defines the general-purpose JP QL that is in many ways similar to SQL and includes many of the same notions. The goal of JP QL is to provide an object-based query language that's easy to translate into SQL for execution on a relational database. Of course data grids aren't relational databases and each has its own query framework. The extent that JP QL can be translated and executed on a particular grid depends on the expressiveness of the grid's query framework.

Another challenging area is object relationships. JPA supports a number of relationship types along with the notion of embedded objects. Relationship support varies by data grid product and each has its subtleties. Issues include: what kind of relationships are supported; whether objects can have relationships across the grid or must be co-located; and what query operators are supported on relationships. The answer to this last question obviously has a big impact on what kind of JP QL queries can be executed.

This list is definitely not exhaustive but it highlights the kinds of issues that have an impact on JPA/data grid integration.

Data grids are not relational databases and so we can't expect a perfect match between JPA and data grids. But even with some limitations, JPA on the grid is an exciting technology that provides a way to evolve JPA applications to leverage the power of data grids to build scalable high-performance systems.


More Stories By Shaun Smith

Shaun Smith is a Principal Product Manager for Oracle TopLink and an active member of the Eclipse community. He's Ecosystem Development Lead for the Eclipse Persistence Services Project (EclipseLink) and a committer on the Eclipse EMF Teneo and Dali Java Persistence Tools projects. He’s currently involved with the development of JPA persistence for OSGi and Oracle TopLink Grid, which integrates Oracle Coherence with Oracle TopLink to provide JPA on the grid.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@ThingsExpo Stories
The vision of a connected smart home is becoming reality with the application of integrated wireless technologies in devices and appliances. The use of standardized and TCP/IP networked wireless technologies in line-powered and battery operated sensors and controls has led to the adoption of radios in the 2.4GHz band, including Wi-Fi, BT/BLE and 802.15.4 applied ZigBee and Thread. This is driving the need for robust wireless coexistence for multiple radios to ensure throughput performance and th...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Major trends and emerging technologies – from virtual reality and IoT, to Big Data and algorithms – are helping organizations innovate in the digital era. However, to create real business value, IT must think beyond the ‘what’ of digital transformation to the ‘how’ to harness emerging trends, innovation and disruption. Architecture is the key that underpins and ties all these efforts together. In the digital age, it’s important to invest in architecture, extend the enterprise footprint to the cl...
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
If you had a chance to enter on the ground level of the largest e-commerce market in the world – would you? China is the world’s most populated country with the second largest economy and the world’s fastest growing market. It is estimated that by 2018 the Chinese market will be reaching over $30 billion in gaming revenue alone. Admittedly for a foreign company, doing business in China can be challenging. Often changing laws, administrative regulations and the often inscrutable Chinese Interne...
Adobe is changing the world though digital experiences. Adobe helps customers develop and deliver high-impact experiences that differentiate brands, build loyalty, and drive revenue across every screen, including smartphones, computers, tablets and TVs. Adobe content solutions are used daily by millions of companies worldwide-from publishers and broadcasters, to enterprises, marketing agencies and household-name brands. Building on its established design leadership, Adobe enables customers not o...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
Information technology is an industry that has always experienced change, and the dramatic change sweeping across the industry today could not be truthfully described as the first time we've seen such widespread change impacting customer investments. However, the rate of the change, and the potential outcomes from today's digital transformation has the distinct potential to separate the industry into two camps: Organizations that see the change coming, embrace it, and successful leverage it; and...
Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is expected in the amount of information being processed, managed, analyzed, and acted upon by enterprise IT. This amazing is not part of some distant future - it is happening today. One report shows a 650% increase in enterprise data by 2020. Other estimates are even higher....
Smart Cities are here to stay, but for their promise to be delivered, the data they produce must not be put in new siloes. In his session at @ThingsExpo, Mathias Herberts, Co-founder and CTO of Cityzen Data, will deep dive into best practices that will ensure a successful smart city journey.
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
Why do your mobile transformations need to happen today? Mobile is the strategy that enterprise transformation centers on to drive customer engagement. In his general session at @ThingsExpo, Roger Woods, Director, Mobile Product & Strategy – Adobe Marketing Cloud, covered key IoT and mobile trends that are forcing mobile transformation, key components of a solid mobile strategy and explored how brands are effectively driving mobile change throughout the enterprise.
From wearable activity trackers to fantasy e-sports, data and technology are transforming the way athletes train for the game and fans engage with their teams. In his session at @ThingsExpo, will present key data findings from leading sports organizations San Francisco 49ers, Orlando Magic NBA team. By utilizing data analytics these sports orgs have recognized new revenue streams, doubled its fan base and streamlined costs at its stadiums. John Paul is the CEO and Founder of VenueNext. Prior ...
Businesses are struggling to manage the information flow and interactions between all of these new devices and things jumping on their network, and the apps and IT systems they control. The data businesses gather is only helpful if they can do something with it. In his session at @ThingsExpo, Chris Witeck, Principal Technology Strategist at Citrix, will discuss how different the impact of IoT will be for large businesses, expanding how IoT will allow large organizations to make their legacy ap...
“We're a global managed hosting provider. Our core customer set is a U.S.-based customer that is looking to go global,” explained Adam Rogers, Managing Director at ANEXIA, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.
What does it look like when you have access to cloud infrastructure and platform under the same roof? Let’s talk about the different layers of Technology as a Service: who cares, what runs where, and how does it all fit together. In his session at 18th Cloud Expo, Phil Jackson, Lead Technology Evangelist at SoftLayer, an IBM company, spoke about the picture being painted by IBM Cloud and how the tools being crafted can help fill the gaps in your IT infrastructure.
Developing software for the Internet of Things (IoT) comes with its own set of challenges. Security, privacy, and unified standards are a few key issues. In addition, each IoT product is comprised of (at least) three separate application components: the software embedded in the device, the back-end service, and the mobile application for the end user’s controls. Each component is developed by a different team, using different technologies and practices, and deployed to a different stack/target –...
One of biggest questions about Big Data is “How do we harness all that information for business use quickly and effectively?” Geographic Information Systems (GIS) or spatial technology is about more than making maps, but adding critical context and meaning to data of all types, coming from all different channels – even sensors. In his session at @ThingsExpo, William (Bill) Meehan, director of utility solutions for Esri, will take a closer look at the current state of spatial technology and ar...
According to Forrester Research, every business will become either a digital predator or digital prey by 2020. To avoid demise, organizations must rapidly create new sources of value in their end-to-end customer experiences. True digital predators also must break down information and process silos and extend digital transformation initiatives to empower employees with the digital resources needed to win, serve, and retain customers.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, wh...