| By Colin Hendricks | Article Rating: |
|
| November 14, 2007 07:45 AM EST | Reads: |
25,608 |
Enterprise software developers and corporate IT architects have established the Java Enterprise Edition (JEE) platform as a leading choice for building enterprise software applications. The platform is widely used for everything from eCommerce Websites to back office data aggregation systems. Its versatility and reliability as an enterprise computing platform is well established.
But this wasn't always so. Sun initially trumpeted Java as a desktop platform that would bring rich content to Web applications in the form of Java applets that run locally in a user's Web browser. It was also touted as a thick-client desktop application development tool that would be widely used to build applications that could run on any computer (remember write once, run anywhere?).
Sometime in the late nineties, Java application development took a 90 degree turn and ended up resulting in software that mostly runs on corporate servers instead of corporate workstations. Today, a substantial portion of Web applications are delivered on the JEE platform.
Despite the "Enterprise" in its name, the JEE platform was principally designed for handling HTTP requests from Web browsers and performing some business logic in response to each request. It now includes many other technologies, but most of them are related to this mission.
However, as the complexity and disparate uses of Web applications has grown, users and designers of these systems have found many users for JEE beyond just responding to requests from a browser. Many of these uses include common enterprise back office tasks such as batch processing of large volumes of data, and while the JEE platform was not originally designed for such purposes, it is versatile enough to provide viable solutions to these problems.
What Is a Batch?
Batch scenarios arise often in business software applications because of a conflict between the enterprise's desire to respond immediately to customer requests and also analyze the resulting transactions. This requires the speedy capture of the initial transaction with no analysis and then a later batch process to aggregate or optimize the data for reporting, analysis, archive or some other large volume process. It is a safe assumption that every business in the world does some kind of batch processing on their data.
The characteristics of the typical batch process include:
• A long-running process that must occur on a regularly scheduled basis.
• The volume of data to be processed is high, usually on the order of thousands to millions of database rows.
• There may be complex logic or calculations to perform on the data.
• The process may require a large set of data from some other system that is delivered at a specific time in a large set.
• The process is run asynchronously from user interactions. It's not part of a user session in an online system. A user does not start it and is not waiting on it to complete.
Why Do Batch Processing in JEE?
The JEE specification was designed for online Web applications and has several limitations with respect to batch processing. For instance, JEE containers are required to manage the life cycle of Enterprise JavaBeans (EJB) and as such might limit the ability to create threads from within these classes.
However, this limitation can be overcome in a couple ways. First, while most JEE containers discourage developers from creating and managing their own threads, they do not prohibit the practice, especially outside the bounds of EJB classes. Therefore, the batch process can do its own threading using the java.util.Concurrent package (available as of Java 5) and on most JEE platforms this causes no trouble. This package provides user-friendly thread pool classes and thread management facilities that make it easier than ever to create multi-threaded applications in Java.
Second, a more spec-compliant approach to multithreading is to use Java Message Service (JMS) messages to create worker threads within the JEE context. This approach is a little more complex to implement but provides the benefits of complying with the JEE specification while also allowing the batch process to span multiple Java Virtual Machine (JVM) instances in a clustering situation. This will be discussed in more detail below.
Another issue with batch processing on the JEE platform is that by default the container manages transactions and session timeouts. The JEE container is inclined to limit how long resources such as database connections, transactions and beans can be monopolized. This is meant to guarantee a high level of service to all users within an online application, but can be problematic for a long-running batch process.
This issue can be addressed by correctly configuring a batch process not to require JEE transactions and to avoid the use of entity beans and stateful session beans that might have timeout or locking problems. Also, be sure to use the pooled resources such as database connections judiciously, releasing them back to the pool when not in use.
In addition to these limitations there is a performance question. Other methods can achieve higher performance than the JEE platform. Batch processing typically involves operations on large volumes of rows stored in a relational database, and a stored procedure implemented directly in the database might offer the fastest performance for most applications. However, there are legitimate reasons to implement the logic in JEE instead.
• Stored procedures are typically implemented in the version of SQL specific to the database platform and are not portable to other databases. This may not matter for a departmental application but is usually not acceptable for an enterprise software product that must be supported on many different databases.
• The JEE platform provides complimentary technology such as JCA connections to other systems, Web service calls to other services and other features that might be useful.
• Logic implemented in Java can reuse other application logic that is also present in the business layer tier of the application.
• Well-written Java code is usually easier to understand, maintain, and enhance than a collection of stored procedures.
• JEE servers usually include clustering capabilities that provide the ability to federate multiple, cheap, commodity servers to improve batch processing performance.
These benefits will often outweigh any performance gain that might be achieved using stored procedures. Furthermore, the difference in performance between a Java solution and a database stored procedure solution can be minimized using the techniques described below.
Techniques for High-Performance Batch Processing on JEE
Now that we've covered the limitations and the alternatives, let's discuss how to architect a batch process on the JEE platform for maximum performance. Batch problems are clearly candidates for multi-threaded solutions because the objective is to complete as much work as possible in the shortest time possible and no human user interaction is necessary. Parallel processing using multiple threads is necessary to bring all available computing resources to bear on the problem. Today's multiple core, multiple CPU servers are especially well suited for multi-threaded processing.
Published November 14, 2007 Reads 25,608
Copyright © 2007 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Colin Hendricks
Colin Hendricks is CTO of Rome Corp. He has worked as a software developer and consultant on high-performance, server-side Java systems for the past 10 years.
![]() |
Snehal Antani 07/27/08 08:06:36 PM EDT | |||
Kalyan, to answer your questions: "what are the hiccups?": a key issue with batch processing using java and application servers relates to JDBC cursors, transactions, and holding cursors across transactions. Checkpointing - committing work periodically so you can restart the job if needed - is important in batch. Checkpointing is achieved by using transactions, JTA transactions specifically. Unfortunately if you use a Type-4 JDBC driver with XA, you're not able to keep cursors open across transactions, therefore you are not easily able to do a "select account from table1" type of query that retrieves all of the accounts to process and leverage some checkpoint strategy as you process those records. There are a few approaches to getting around this: first, we've built a stateful session bean pattern (SFSB) where reads to the DB are done in a local transaction and the writes to the database are done in the global transaction; second, executing smaller queries that are bounded by the checkpoint intervals versus one very large query; third, if you are on z/OS and your data is in DB2 z/OS, to use the Type-2 JDBC driver that allows you to hold cursors across transactions; fourth, to use Last Participant Support, which is the ability to use a single 1-PC resource in a 2-PC (XA) transaction. This problem will plague *every* java-batch solution and a pain due to limitations in XA. The WebSphere XD Compute Grid (aka WebSphere Batch) forum has some posts on this topic, please feel free to ask more questions there: http://www-128.ibm.com/developerworks/forums/forum.jspa?forumID=1240&sta.... Within Compute Grid, we've built the SFSB pattern as part of our Batch Datastream Framework (BDS Framework) to make it simpler to leverage. Using LPS or type-2 drivers is pretty straightforward in WebSphere. Another important gotcha is workload management and ensuring your batch processing doesn't negatively impact your online transaction (OLTP) workloads (and vice versa). The only way to have a good solution in this area is to use a software stack that integrates with the database and the workload manager. Basically, you need an integrated batch and OLTP platform, not just a batch container. "app's performance would depend on database specifics": yes, of course, but this is business-as-usual. DB vendors have their own knobs and runtime behaviors that will differ, therefore each has to be optimized in its own way. "what sort of frameworks have you worked with": I've found Hibernate to not be very good for batch processing. You can read more about why here: http://forum.hibernate.org/viewtopic.php?t=988575&view=next&sid=0aada757.... I've seen customers use IBatis, OpenJPA, raw JDBC, Pure Query, and SQLJ/Static SQL. As the article mentions, getting down to the raw SQL query for Batch can be crucial for performance. I tend to stick to raw JDBC and I use the Batch Data Stream Framework (BDS Framework) to manage the connections, prepared statements, restarting, etc. You can read more about this at: http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=190623... |
||||
![]() |
Kalyan 11/13/07 04:06:33 PM EST | |||
This article looks pretty good in its content. Couple of questions though: # Have you used this architecture on any of the systems that you have implemented? If so, what are the hiccups that you have come across? # Though you discourage using storedpocs for performance reasons, you say that tweak some database configuration to see if one can get better performance. Wouldn't this make the app's performance (thought not logic) dependent on database specifics? Interacting with databases is the most important part of any batch processing application that has to save data to the persistent store. It'd be interesting to see what sort of framework (hibernate, ibatis, etc.) have you worked with in this kind of architecture. |
||||
![]() |
Snehal Antani 08/13/07 04:06:11 PM EDT | |||
Interesting article. I recently published an article describing your Dispatcher-Worker pattern for highly parallel batch jobs in the context of WebSphere XD Compute Grid. http://www.ibm.com/developerworks/websphere/techjournal/0707_antani/0707... An interesting extension to the your description is depicted in figure 6 of my article- establishing endpoint affinity which enables new caching opportunities. The minus with using straight JEE5 multi-threading packages versus building on an existing enterprise java batch framework like Compute Grid- the developer would have to manage threading which, for enterprise adopters composed of large development teams, could be more trouble than its worth. |
||||
- An Exclusive Interview with Oracle, Cloud Expo 2010 Diamond Sponsor
- Whatever the Apple iPad Is, It Apparently Leaks Like a Sieve
- Whatever Happened to JAAS?
- What’s Next for Oracle-Sun?
- Cloud Expo New York Call for Papers to Expire January 15, 2010
- Six Enterprise Megatrends to Watch in 2010
- Oracle Maps Its Cloud Computing Strategy During Cloud Expo Keynote
- Oracle’s Next Sun Hurdle
- Oracle Claims Victory Over EC; Says Sun Will Sell Clouds
- Now Russia Threatens to Hold Up Oracle-Sun Deal
- Free Virtual Appliance for Cloud Computing
- Why Cops and Java Developers Have Low Salaries?
- Kindle 2 vs Nook
- Cloud Expo New York Call for Papers Now Open
- Is Cloud Computing Like Teenage Sex?
- An Exclusive Interview with Oracle, Cloud Expo 2010 Diamond Sponsor
- Performance Tuning Essentials for Java
- Whatever the Apple iPad Is, It Apparently Leaks Like a Sieve
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- Whatever Happened to JAAS?
- Cloud Computing Can Revitalize Your Career as Software Developer
- What’s Next for Oracle-Sun?
- Cloud Expo New York Call for Papers to Expire January 15, 2010
- The End of IT 1.0 As We Know It Has Begun
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- i-Technology Predictions for 2007: Where's It All Headed?


























