Welcome!

Java Authors: Al Mannarino, Christopher Keene, Anatoly Krivitsky, Pat Romanski, Jason Weathersby

Related Topics: Java

Java: Article

Making Optimal Use of JMX in Custom Application Monitoring Systems

Some best practices and recommendations

Give Me a Break - Part 2
A monitoring application is often required to "slice and dice" data it's collected, typically in the form of charts and graphs showing data "grouped by" various dimensions.

In our sample monitoring application the data are associated with two sources: the Channel, and the Server on which that Channel is implemented. The same channel can exist on different servers.

A user may want to see the total messages sent/received across all channels on a server (aggregation), or see how the traffic for all channels is distributed across different servers (breakdown). Since the server, channel, and message counts are all available in the same MBean, this is not a problem. A Group By operation can be done on the data, and the appropriate chart used to present the results.

However, look at what happens when the user tries to correlate the total message counts and the number of connections on each channel using our sample MBeans. To do a Group By operation relating connections and total messages, the data have to be in the same table. It's necessary to perform a join operation on the two independent tables before the data can be used for this purpose.

Upon review, it can be seen that there's a one-to-one correspondence between the rows of data in the first table, the Channel table, and the rows of the second table, the Network table. Because the data structures inside the application were maintained separately, the JMX MBean was designed to expose the data in two different tables. It might have been better to combine these two tables ahead of time to minimize the work that has to be done on the client side (see Table 10).

This is not to suggest that you build analytics into your MBeans. Rather, collapse the data structures to their minimal form before exposing the data. One wouldn't say that the answer to an algebra problem was 2x + 3x + 5 + 9. Instead, it would be simplified by combining the common terms and providing 5x + 14 as the answer.

Recommendation: When there's a one-to-one relationship between available metrics, combine these into a single bean to avoid having to do joins and combines on the client side.

Have Some Foresight
In the sample ChannelInfo MBean, memory metrics are stored in a 32-bit integer, assuming the common Java limitation of a 1GB heap space.

The integer representation works fine as long as you're only looking at one bean at a time as in the simple HTML interface to the beans. However, when aggregating this metric across multiple beans, we get Table 11.

The total for all the channels is a number that is well over the largest positive integer - 2,147,483,648 - which can be represented in 32-bits. The integer worked fine for one channel, but not multiple channels. The data type should have been long from the beginning, anticipa-ting aggregation.

Recommendation: Look ahead to the results of aggregations and use data types that are appropriate. Aggregate totals can quickly grow very large.

Tracking History
A common requirement for monitoring systems is to maintain a historical record of activity. One solution is for an MBean to write to a log file and record its own history. However, in real-world systems, where data volumes may be huge, this isn't practical. Storing the data in a relational database provides the user many more options for reporting than would simple log files.

Often overlooked is the need to include a timestamp with the data. This is easy to do of course, but suffers from a major problem. Activity on the network can delay acquisition of the metrics data. If a timestamp is assigned at the time the data are received in the client, any computation of rates or averages that makes use of the delayed timestamp may be inaccurate.

Recommendation: Include a timestamp with all data intended for archival or time-based analysis. It should be stored at the time of data acquisition and be in millisecond resolution to provide the most precise calculations.

To Poll or Not To Poll
Given the large amount of real-time data that can be produced in a monitoring system, one question often comes up: Can the notification capabilities of JMX be used to minimize network traffic and the overall load on the system?

Typically, a monitoring system is initially developed in a polling mode. In a regular time interval, e.g., 10 seconds, a request is made to query various metrics and the data are transmitted back to a client system for analysis and display. This polled approach can be costly in terms of network bandwidth and processing overhead.

However, looking to notifications to solve this problem may be somewhat fruitless. Notifications aren't a panacea for all that's wrong with a monitoring system. In fact, if used incorrectly, even more overhead could be introduced.

The requirement to store historical data means that metrics must be obtained on a regular basis, whether or not anyone is looking at them. For example, the Total Message counts and Bytes Used metrics are not candidates for notifications. These must be polled to maintain a consistent history of the values. There's nothing to be gained by using notifications.

On the other hand, the connections count may not change every 10 seconds. A connection could remain alive for hours or days. In this case, the use of notifications could reduce the network traffic by only sending data about the connection count when it changes.

A number of other issues should be considered when using notifications. These are often overlooked yet require development support to use notifications properly:

1)  The use of notifications must be combined with polling and/or a caching mechanism. A notification isn't issued until a data element changes. When a display page is brought up, showing the current count of connections, it will initially be blank and fill with data only as the connections are added or removed.

The client application has to populate a table or chart with the current set of values either by polling for that data on display activation or by using a cache that maintains current table values independent of the active displays. This functionality requires that equivalent attributes be provided for any values obtained via notifications.

2)  When used in conjunction with historical data obtained from an archival database, the use of notifications is similarly complex. A trend chart when first brought up must be populated with data obtained from the archive. As notifications are received, those values must be appended to the chart. The archived data must be requested only once.

Building the mechanisms to support the merging of notified data with current or historical data can involve a fair amount of development effort, often minimized in early discussions about building a monitoring client against newly minted JMX data.

Recommendation: Use notifications where data aren't changing regularly and there may be some real reduction in overhead. Don't use notifications for everything just because they are available. Design MBeans to support the integration of notified data with current or historical data.

Scalability & Maintenance
There are many more problems that may occur with monitoring using JMX. Most important of these are in scalability and maintainability. Some systems simply produce too many beans and this can result in terrible performance. In others, the complexity of the MBean names and key properties is overwhelming and can be a huge maintenance burden.

There needs to be the usual tradeoff made in balancing complexity against performance. Many systems return one row of data for each MBean, using the bean name to encode the source. This can get unwieldy if overused, and the number of beans can grow excessively. An alternative might be to design beans that return multiple rows of data, e.g., instead of an MBean for each channel, provide an MBean for the server and return a table containing a row for each channel.

Other issues come into play when one looks at common use patterns of MBeans in a system being monitored. Are the data polled randomly or is there a predictable sequence that can be used to pre-fetch data and have it available on the next cycle?

For maintainability, it's important that data formats for the beans remain stable and are not changed without some sort of upward compatibility plan and/or deprecation plan.

For the most part, a combination of best practice knowledge and good common sense can help produce a quality monitoring system that performs well and has all the required functionality.

References
  •   Sun. "Java Management Extensions (JMX) - Best Practices." 2007.
  •   Justin Murray. HP. "Design Patterns for JMX and Application Manageability" October 2004.
  •   BEA. "WebLogic Server - Developing Custom Management Utilities with Version 9.2, JMX." February 22, 2007.
  •   Sun. J2EE Management Specification JSR-77. Java Community Process.
  •   Kumar Peltz. "Apply JMX Best Practices." Java Pro. December 2004.

More Stories By Tom Lubinski

Tom Lubinski founded SL Corporation in 1984 and is currently the company's president and CEO. He has been instrumental in developing SL's Graphical Modeling System software and Enterprise RTView, a real-time monitoring, analytics and visualization platform. Since founding the company, he has been involved in thousands of successful customer deployments of real-time visibility solutions. Prior to starting SL Corporation, Tom attended the California Institute of Technology and developed a substantial consulting practice specializing in object-oriented programming and graphical visualization systems. He has over 30 years of experience in the development of computer hardware systems and software applications.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.