Welcome!

Java IoT Authors: Yeshim Deniz, Liz McMillan, Mano Marks, Elizabeth White, Pat Romanski

Related Topics: Java IoT

Java IoT: Article

Clustered Timers

For Robust Scalable Systems

Often, when someone asks how we are going to scale the Web application we're about to develop, we look at them, smile, and say, "Not a problem - we'll just cluster the application servers." Clustering our application across multiple servers provides us with the ability to handle large volumes of traffic and to scale systems by adding additional servers to the cluster.

In addition to providing scalability, application clusters make the system more robust by allowing for automatic system failover when a server fails. This way when one server goes down the application continues to run, albeit with slightly decreased performance. While it is true that the current generation of application servers makes it relatively pain-free to create a cluster, there are still several significant, if often overlooked, design issues that must be taken into account now that the system is clustered.

When we run our Web application in a cluster, we have the exact same software running on each machine in the cluster. While this eliminates a host of configuration management difficulties, it does create other problems. While we don't have to write different code for every possible machine in the cluster, there are times when this simplicity actually makes things more complex; the running of scheduled tasks is typically one of these areas. Scheduled tasks are used to execute procedures that need to run at certain fixed times or at fixed intervals. Typical examples of scheduled tasks within a Web application are report-generation tasks and tasks that send data to external systems that are only available within a certain time frame.

To understand why clustering affects how we design our application to handle scheduled tasks, let's consider a generic e-commerce Web application. To allow management to analyze sales trends, profits, inventory, etc., the system has been set up to periodically compile a set of reports and e-mail them to management. Clearly, management doesn't want to receive multiple e-mails containing the same reports, yet this is what we will get if we simply write a scheduled task and then cluster our system. When the appointed time to run the report comes up, all machines in the cluster will generate the same report and send it to management. This can be seen visually in Figure 1.

Perhaps the most straightforward way to solve this problem is to package the code that runs the scheduled tasks into a separate JAR file within the EAR file that contains the WAR file for the Web application. This EAR file is deployed to all the servers in the cluster; however, the JAR containing the scheduled tasks is configured to run on only one of the servers. This solves the problem by preventing the scheduled tasks from ever running on multiple machines. However, there is a significant downside to these solutions. First, you have now created additional configuration management problems. You need to carefully track which servers are set up to run the scheduled tasks and the exact deployment procedures that were used so that when additional servers are added to the cluster, the application is properly deployed on those servers.

The second problem is that you have effectively taken the scheduled tasks out of the cluster. Now, if the machine that is set up with the scheduled tasks fails, or its connection to the network fails, there is no backup or failover system. The tasks won't run. The remainder of this article investigates solutions to this problem that allow the scheduled tasks to remain part of the cluster and don't involve additional configuration management.

To stop every system in the cluster from performing the same scheduled task, report generation in this case, we have to utilize something outside of the application server cluster to track the state of our scheduled task. A perfect candidate for maintaining the state of our scheduled tasks is a shared database, and since nearly all applications already have access to a shared database, this is the resource we will use to solve this problem in our example (see Figure 2). It's worth mentioning that while a shared database is an ideal resource for solving this problem, it's not the only option. The solution presented here could be adapted to use flat files or some other shared resource external to the cluster.

Our external resource, the database in this case, will act as a mediator between competing machines in the cluster. We will create a table in the database that tracks scheduled tasks and their status. When a machine in the cluster wants to run one of the scheduled tasks, it first checks the status of that task in the database to see if some other machine is already running that task. If no other machine is running the task, the status of the task will be updated and that machine will run the task.

Another way of thinking about this solution is to think in terms of a concurrent method running on a single machine. If we see the scheduled task in these terms, it becomes clear that the best way to keep multiple threads from running the task at the same time is to use some sort of semaphore. Again, if this was a single method on one machine, we could easily do this by creating a synchronized block around the code that we wanted to protect. When a thread first attempts to enter the synchronized block, it has to attempt to get the lock. If it fails to get the lock, it can't run. In our distributed system, we are using the database as the lock.

We will call our database table "Tasks" and it will have three columns. The first column will be the name of the task, the second the status of the task, and the third the date and time that the task last changed status. The generic SQL script to produce this table is shown below.

CREATE TABLE 'Tasks' (
'TaskName' varchar(50) NOT NULL,
'Status' varchar(25) NOT NULL,
'StatusTime' datetime,
PRIMARY KEY ('TaskName')
) ;

Now that we have created our database table to serve as our mediator, we can create the class that accesses this table in order to determine if a particular instance of a Task can execute. We'll call this class TaskMonitor. (The source code for this article can be downloaded from www.sys-con.com/java/sourcec.cfm.) The class exposes two methods to the public, public static boolean acquireLock(String taskName) and public static void releaseLock(String taskName). Before a Task runs, it will need to call the acquireLock method of the TaskMonitor. If this method returns True, it's safe for the Task to run. If it returns False, then it's not safe for the Task to run as some other instance of this Task in the cluster is already executing. The key to understanding the TaskMonitor class is to understand the ACQUIRE_LOCK SQL query on lines 5-7.

What needs to be done is to determine if the Task in question, as identified by the field TaskName, is currently Idle, and if so, change its Status to Active. The crucial aspect of this is that it needs to happen atomically, that is, it must all happen as one single step. That's why we use a single update statement instead of writing both a select statement to see if the Task is currently Idle and an update to change its Status. In the case where we use the select statement first, it would be possible for the same select statement to be run by the other machines in the cluster before the update is executed. This would result in multiple Tasks running since they would all see the Idle state. By performing the entire process in an update statement, we take advantage of the automatic exclusive row locking that takes place in the database whenever an update statement is executed.

Now that we understand how the ACQUIRE_LOCK query works, the rest of the acquireLock method of the TaskMonitor is easy to follow. On line 23 the query is executed and the results are examined. The executeUpdate method returns the number of rows that were affected by the query. When the ACQUIRE_LOCK query successfully changes the Task from Idle to Active (as will be the case when this particular query is the first one in the cluster to run), one row will have been affected and the lockAcquired flag will be set to true. Otherwise, no rows will be affected and the lockAcquired flag will remain false.

The releaseLock method of TaskMonitor is meant to be called when a Task has finished executing. This method simply changes the status of the Task back to Idle. Both the releaseLock and the acquireLock methods also update the StatusTime field with the current date and time for record-keeping purposes.

One final note on the TaskMonitor class: the getConnection method shown in lines 75-85 should be upgraded before placing this class into production. As written, the method creates a connection to an instance of a MySQL database. A better practice in production would be to retrieve a connection from an existing connection pool.

Together the Tasks database table and the TaskMonitor class provide a framework for ensuring that only one instance of a given Task is running at a particular time, no matter how many instances of the application are running within the clustered system. At this point we're ready to create our report generating Task.

Because we're concerned with managing Tasks in a clustered environment, and not with creating reports or using the javax.mail APIs, we'll create a simple Task, called ReportTask, to illustrate the concept. Because we want this Task to execute automatically on a schedule, we need to extend java.util.TimerTask. TimerTask is an abstract class that has one method that we have to implement, public void run(). This is the method where all the Task's work is done. For our simple example, ReportTask, we'll output some text to show that the Task is running. The code for this class is shown below.

1) import java.util.TimerTask;
2) public class ReportTask extends TimerTask {
3) public void run() {
4) if(TaskMonitor.acquireLock("ReportTask") == false)
5) return;
6) System.out.println("Creating report to be emailed...");
7) TaskMonitor.releaseLock("ReportTask");
8) }
9) }

The key thing to note here is that before the ReportTask actually performs its work, printing some text in this case, it first attempts to acquire the lock for this Task by making the call to acquireLock on line 4. If it fails to acquire the lock, it simply returns without performing its work. However, if it does successfully acquire the lock, then it's free to perform its work and it goes ahead and prints out its message on line 6. Once the Task is complete, it's vital that the lock be released. This is accomplished by calling releaseLock on line 7. If the lock is never released, this Task will never run again on any machine in the cluster. Ensuring that the lock is properly released is clearly not an issue with this simple example; however, in more complex tasks it can be tricky. Consider a Task where several different error conditions could cause the Task to terminate before running to completion. There are now potentially several places where the lock will have to be released.

At this point, you've probably noticed a serious problem with our Task. We never populated the Tasks table with any tasks. As things stand, our ReportTask will never be able to acquire a lock and will never run, and this step needs to take place for every Task that's going to be managed in this way. To rectify this situation we need to insert the ReportTask information into the Tasks table using the following SQL script:

insert into Tasks values (‘ReportTask', ‘Idle', null);

We've nearly finished setting up our system for managing clustered tasks. So far we've created an external resource and a TimerTask called ReportTask that will run in our cluster. All that remains to be done is to create a Timer for running our ReportTask. Because we want to start the Timer for our task as soon as the application starts, we'll create a servlet called StartupServlet that does the work of creating our Timer. We will ensure that StartupServlet is loaded immediately by adding the following lines to web.xml:

<servlet>
<servlet-name>StartupServlet</servlet-name>
<display-name>StartupServlet</display-name>
<description>Used to create the Timers</description>
<servlet-class>StartupServlet</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>

As our simple StartupServlet is not designed to handle requests, it doesn't need to override any method other than init(). When we create the Timer for running our ReportTask, it's important that we use one of the overloaded constructors to create the Timer as a daemon thread. If we don't specify that the Timer should be a daemon thread and use the default no argument constructor, the Timer will not be a daemon thread. By making it a daemon thread, we ensure that the Timer will continue to run for as long as our Web application runs and that it will terminate when the application terminates. We don't want to try to generate reports if the application has been stopped for some reason.

After calculating how many milliseconds are in a day (we want our ReportTask to run once a day), we schedule the ReportTask to run daily, starting now. On line 12 we place the Timer that we created in the ServletContext. While this is not strictly necessary to keep the ReportTask running, by keeping a reference to the Timer available we are able to check easily on the status of the ReportTimer or cancel it entirely should the need arise.

With the StartupServlet in place, we now have a very basic but workable system for running scheduled Tasks in a clustered environment, without having to worry about the same task running on all of the machines in the cluster simultaneously. It's important to note that if this scheme is used as presented and the tasks being executed complete in a very short period of time, you could still see duplicate executions of the same task if the clocks on all of the machines are not in synch with each other. While it is possible to extend this approach to address this problem, it's outside the scope of this article. With a little bit of effort, this system can also be extended to allow for such things as programmatic modification of the running tasks, robust error handling, and recovery of frozen tasks.

More Stories By Clark D. Richey Jr.

Clark is a principal consultant with the RABA Technologies RiSC group for advanced research and development. In his spare time, he teaches the Java platform to students at Loyola College, where as an associate professor, he shares his experiences with much enthusiasm. Clark is the founder of both JUGaccino, a Maryland-based JUG, and the StopLight and PermissionSniffer open source projects. He is also involved in implementing highly scalable, highly secure, service-oriented architectures using Jini.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
cbellonch 05/12/05 11:04:17 AM EDT

Hi,
Thanks for the article, it would be useful for our project. We've tried to download the code in:
· www.sys-con.com/java/sourcec.cfm
· http://www.sys-con.com/java/archives3/0903/Richey0903.zip

without success, are the links correct?

tbb 03/10/04 08:44:10 AM EST

I believe a class that implements ServletContextListener would be a better way to solve this problem than a servlet that loads on startup. (If your servlet container implements the servlet 2.3+ spec).

@ThingsExpo Stories
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
In his keynote at @ThingsExpo, Chris Matthieu, Director of IoT Engineering at Citrix and co-founder and CTO of Octoblu, focused on building an IoT platform and company. He provided a behind-the-scenes look at Octoblu’s platform, business, and pivots along the way (including the Citrix acquisition of Octoblu).
Data is an unusual currency; it is not restricted by the same transactional limitations as money or people. In fact, the more that you leverage your data across multiple business use cases, the more valuable it becomes to the organization. And the same can be said about the organization’s analytics. In his session at 19th Cloud Expo, Bill Schmarzo, CTO for the Big Data Practice at Dell EMC, introduced a methodology for capturing, enriching and sharing data (and analytics) across the organization...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the USA and Europe, we work with a variety of customers from emerging startups to Fortune 1000 companies.
Financial Technology has become a topic of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 20th Cloud Expo at the Javits Center in New York, June 6-8, 2017, will find fresh new content in a new track called FinTech.
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 20th Cloud Expo, which will take place on June 6-8, 2017 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 add...
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
@ThingsExpo has been named the Most Influential ‘Smart Cities - IIoT' Account and @BigDataExpo has been named fourteenth by Right Relevance (RR), which provides curated information and intelligence on approximately 50,000 topics. In addition, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to eve...
SYS-CON Events announced today that Grape Up will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct. 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company specializing in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market across the U.S. and Europe, Grape Up works with a variety of customers from emergi...
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
20th Cloud Expo, taking place June 6-8, 2017, at the Javits Center in New York City, NY, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy.
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
Amazon has gradually rolled out parts of its IoT offerings in the last year, but these are just the tip of the iceberg. In addition to optimizing their back-end AWS offerings, Amazon is laying the ground work to be a major force in IoT – especially in the connected home and office. Amazon is extending its reach by building on its dominant Cloud IoT platform, its Dash Button strategy, recently announced Replenishment Services, the Echo/Alexa voice recognition control platform, the 6-7 strategic...
Judith Hurwitz is president and CEO of Hurwitz & Associates, a Needham, Mass., research and consulting firm focused on emerging technology, including big data, cognitive computing and governance. She is co-author of the book Cognitive Computing and Big Data Analytics, published in 2015. Her Cloud Expo session, "What Is the Business Imperative for Cognitive Computing?" is scheduled for Wednesday, June 8, at 8:40 a.m. In it, she puts cognitive computing into perspective with its value to the busin...
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Cybersecurity is a critical component of software development in many industries including medical devices. However, code is not always written to be robust or secure from the unknown or the unexpected. This gap can make medical devices susceptible to cybersecurity attacks ranging from compromised personal health information to life-sustaining treatment. In his session at @ThingsExpo, Clark Fortney, Software Engineer at Battelle, will discuss how programming oversight using key methods can incre...