Click here to close now.

Welcome!

Java Authors: Liz McMillan, VictorOps Blog, Elizabeth White, Pat Romanski, Carmen Gonzalez

Related Topics: Java, XML, MICROSERVICES, AJAX & REA, Apache, Big Data Journal

Java: Article

Fix Memory Leaks in Java Production Applications

A memory diagnostics approach for production identifies and fixes the root cause of the problem

Adding more memory to your JVMs (Java Virtual Machines) might be a temporary solution to fixing memory leaks in Java applications, but it for sure won't fix the root cause of the issue. Instead of crashing once per day it may just crash every other day. "Preventive" restarts are also just another desperate measure to minimize downtime, but, let's be frank: this is not how production issues should be solved.

One of our customers - a large online retail store - ran into such an issue. They run one of their online gift card self-service interfaces on two JVMs. During peak holiday seasons when users are activating their gift cards or checking the balance, crashes due to OOM (Out Of Memory) were more frequent, which caused bad user experience. The first "measure" they took was to double the JVM Heap Size. This didn't solve the problem as JVMs were still crashing, so they followed the memory diagnostics approach for production as explained in Java Memory Leaks to identify and fix the root cause of the problem.

Before we walk through the individual steps, let's look at the memory graph that shows the problems they had in December during the peak of the holiday season. The problem persisted even after increasing the memory. They could fix the problem after identifying the real root cause and applying specific configuration changes to a third-party software component.

After identifying the actual root cause and applying necessary configuration changes did the memory leak issue go away? Increasing Memory was not even a temporary solution that worked.

Step 1: Identify a Java Memory Leak
The first step is to monitor the JVM/CLR Memory Metrics such as Heap Space. This will tell us whether there is a potential memory leak. In this case we see memory usage constantly growing, resulting in an eventual runtime crash when the memory limit is reached.

Java Heap Size of both JVMs showed significant growth starting Dec 2nd and Dec 4th resulting in a crash on Dec 6th for both JVMs when the 512MB Max Heap Size was exceeded.

Step 2: Identify problematic Java Objects
The out-of-memory exception automatically triggers a full memory dump that allows for an analysis of which objects consumed the heap and are most likely to be the root cause of the out-of-memory crash. Looking at the objects that consumed most of the heap below indicates that they are related to a third-party logging API used by the application.

Sorting by GC (Garbage Collection) Size and focusing on custom classes (instead of system classes) shows that 80% of the heap is consumed by classes of a third-party logging framework

A closer look at an instance of the VPReportEntry4 shows that it contains five strings - with one consuming 23KB (as compared to several bytes of other string objects).This also explains the high GC Size of the String class in the overall Heap Dump.

Individual very large String objects as part of the ReportEntry object

Following the referrer chain further up reveals the complete picture. The EventQueue keeps LogEvents in an Array, which keeps VPReportEntrys in an Array. All of these objects seem to be kept in memory as the objects are being added to these arrays but never removed and therefore not garbage collected:

Following the referrer tree reveals that global EventQueue objects hold on to the LogEvent and VPReportEntry objects in array lists which are never removed from these arrays

Step 3: Who allocates these objects?
Analyzing object allocation allows us to figure out which part of the code is creating these objects and adding them to the queue. Creating what is called a "Selective Memory Dump" when the application reached 75% Heap Utilization showed the customer that the ReportWriter.report method allocated these entries and that they have been "living" on the heap for quite a while.

It is the report method that allocates the VPReportEntry objects that stay on the heap for quite a while

Step 4: Why are these objects not removed from the Heap?
The premise of the third-party logging framework is that log entries will be created by the application and written in batches at certain times by sending these log entries to a remote logging service using JMS. The memory behavior indicates that even though these log entries might be sent to the service, these objects are not always removed from the EventQueue leading to the out-of-memory exception.

Further analysis revealed that the background batch writer thread calls a logBatch method, which loops through the event queue (calling EventQueue.next) to send current log events in the queue. The question is whether as many messages were taken out of the queue (using next) vs put into the queue (using add) and whether the batch job is really called frequently enough to keep up with the incoming event entries. The following chart shows the method executions of add, as well as the call to logBatch highlighting that logBatch is actually not called frequently enough and therefore not calling next to remove messages from the queue:

The highlighted area shows that messages are put into the queue but not taken out because the background batch job is not executed. Once this leads to an OOM and the system restarts it goes back to normal operation but older log messages will be lost.

Step 5: Fixing the Java Memory Leak problem
After providing this information to the third-party provider and discussing with them the number of log entries and their system environment the conclusion was that our customer used a special logging mode that was not supposed to be used in high-load production environments. It's like running with DEBUG log level in a high load or production environment. This overwhelmed the remote logging service and this is why the batch logging thread was stopped and log events remained in the EventQueue until the out of memory occurred.

After making the recommended changes the system could again run with the previous heap memory size without experiencing any out-of-memory exceptions.

The Memory Leak issue has been solved and the application now runs even with the initial 512MB Heap Space without any problem.

They still use the same dashboards they have built to troubleshoot this issue, and to monitor for any future excessive logging problems.

These dashboards allow them to verify that the logging framework can keep up with log messages after they applied the changes.

Conclusion
Adding additional memory to crashing JVMs is most often not a temporary fix. If you have a real Java memory leak it will just take longer until the Java runtime crashes. It will even incur more overhead due to garbage collection when using larger heaps. The real answer to this is to use the simple approach explained here. Look at the memory metrics to identify whether you have a leak or not. Then identify which objects are causing the issue and why they are not collected by the GC. Working with engineers or third-party providers (as in this case) will help you find a permanent solution that allows you to run the system without impacting end users and without additional resource requirements.

Next Steps
If you want to learn more about Java Memory Management or general Application Performance Best Practices check out our free online Java Enterprise Performance Book. Existing customers of our APM Solution may also want to check out additional best practices on our APM Community.

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
GENBAND has announced that SageNet is leveraging the Nuvia platform to deliver Unified Communications as a Service (UCaaS) to its large base of retail and enterprise customers. Nuvia’s cloud-based solution provides SageNet’s customers with a full suite of business communications and collaboration tools. Two large national SageNet retail customers have recently signed up to deploy the Nuvia platform and the company will continue to sell the service to new and existing customers. Nuvia’s capabilities include HD voice, video, multimedia messaging, mobility, conferencing, Web collaboration, deskt...
The WebRTC Summit 2014 New York, to be held June 9-11, 2015, at the Javits Center in New York, NY, announces that its Call for Papers is open. Topics include all aspects of improving IT delivery by eliminating waste through automated business models leveraging cloud technologies. WebRTC Summit is co-located with 16th International Cloud Expo, @ThingsExpo, Big Data Expo, and DevOps Summit.
SYS-CON Media announced today that @WebRTCSummit Blog, the largest WebRTC resource in the world, has been launched. @WebRTCSummit Blog offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. @WebRTCSummit Blog can be bookmarked ▸ Here @WebRTCSummit conference site can be bookmarked ▸ Here
SYS-CON Events announced today that Cisco, the worldwide leader in IT that transforms how people connect, communicate and collaborate, has been named “Gold Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Cisco makes amazing things happen by connecting the unconnected. Cisco has shaped the future of the Internet by becoming the worldwide leader in transforming how people connect, communicate and collaborate. Cisco and our partners are building the platform for the Internet of Everything by connecting the...
Temasys has announced senior management additions to its team. Joining are David Holloway as Vice President of Commercial and Nadine Yap as Vice President of Product. Over the past 12 months Temasys has doubled in size as it adds new customers and expands the development of its Skylink platform. Skylink leads the charge to move WebRTC, traditionally seen as a desktop, browser based technology, to become a ubiquitous web communications technology on web and mobile, as well as Internet of Things compatible devices.
SYS-CON Events announced today that robomq.io will exhibit at SYS-CON's @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. robomq.io is an interoperable and composable platform that connects any device to any application. It helps systems integrators and the solution providers build new and innovative products and service for industries requiring monitoring or intelligence from devices and sensors.
Wearable technology was dominant at this year’s International Consumer Electronics Show (CES) , and MWC was no exception to this trend. New versions of favorites, such as the Samsung Gear (three new products were released: the Gear 2, the Gear 2 Neo and the Gear Fit), shared the limelight with new wearables like Pebble Time Steel (the new premium version of the company’s previously released smartwatch) and the LG Watch Urbane. The most dramatic difference at MWC was an emphasis on presenting wearables as fashion accessories and moving away from the original clunky technology associated with t...
SYS-CON Events announced today that Akana, formerly SOA Software, has been named “Bronze Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. Akana’s comprehensive suite of API Management, API Security, Integrated SOA Governance, and Cloud Integration solutions helps businesses accelerate digital transformation by securely extending their reach across multiple channels – mobile, cloud and Internet of Things. Akana enables enterprises to share data as APIs, connect and integrate applications, drive part...
Docker is an excellent platform for organizations interested in running microservices. It offers portability and consistency between development and production environments, quick provisioning times, and a simple way to isolate services. In his session at DevOps Summit at 16th Cloud Expo, Shannon Williams, co-founder of Rancher Labs, will walk through these and other benefits of using Docker to run microservices, and provide an overview of RancherOS, a minimalist distribution of Linux designed expressly to run Docker. He will also discuss Rancher, an orchestration and service discovery platf...
SYS-CON Events announced today that Vitria Technology, Inc. will exhibit at SYS-CON’s @ThingsExpo, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Vitria will showcase the company’s new IoT Analytics Platform through live demonstrations at booth #330. Vitria’s IoT Analytics Platform, fully integrated and powered by an operational intelligence engine, enables customers to rapidly build and operationalize advanced analytics to deliver timely business outcomes for use cases across the industrial, enterprise, and consumer segments.
SYS-CON Events announced today that Solgenia will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY, and the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Solgenia is the global market leader in Cloud Collaboration and Cloud Infrastructure software solutions. Designed to “Bridge the Gap” between Personal and Professional Social, Mobile and Cloud user experiences, our solutions help large and medium-sized organizations dr...
SYS-CON Events announced today that Liaison Technologies, a leading provider of data management and integration cloud services and solutions, has been named "Silver Sponsor" of SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York, NY. Liaison Technologies is a recognized market leader in providing cloud-enabled data integration and data management solutions to break down complex information barriers, enabling enterprises to make smarter decisions, faster.
Cloud is not a commodity. And no matter what you call it, computing doesn’t come out of the sky. It comes from physical hardware inside brick and mortar facilities connected by hundreds of miles of networking cable. And no two clouds are built the same way. SoftLayer gives you the highest performing cloud infrastructure available. One platform that takes data centers around the world that are full of the widest range of cloud computing options, and then integrates and automates everything. Join SoftLayer on June 9 at 16th Cloud Expo to learn about IBM Cloud's SoftLayer platform, explore se...
@ThingsExpo has been named the Top 5 Most Influential M2M Brand by Onalytica in the ‘Machine to Machine: Top 100 Influencers and Brands.' Onalytica analyzed the online debate on M2M by looking at over 85,000 tweets to provide the most influential individuals and brands that drive the discussion. According to Onalytica the "analysis showed a very engaged community with a lot of interactive tweets. The M2M discussion seems to be more fragmented and driven by some of the major brands present in the M2M space. This really allows some room for influential individuals to create more high value inter...
The world's leading Cloud event, Cloud Expo has launched Microservices Journal on the SYS-CON.com portal, featuring over 19,000 original articles, news stories, features, and blog entries. DevOps Journal is focused on this critical enterprise IT topic in the world of cloud computing. Microservices Journal offers top articles, news stories, and blog posts from the world's well-known experts and guarantees better exposure for its authors than any other publication. Follow new article posts on Twitter at @MicroservicesE
SYS-CON Events announced today the IoT Bootcamp – Jumpstart Your IoT Strategy, being held June 9–10, 2015, in conjunction with 16th Cloud Expo and Internet of @ThingsExpo at the Javits Center in New York City. This is your chance to jumpstart your IoT strategy. Combined with real-world scenarios and use cases, the IoT Bootcamp is not just based on presentations but includes hands-on demos and walkthroughs. We will introduce you to a variety of Do-It-Yourself IoT platforms including Arduino, Raspberry Pi, BeagleBone, Spark and Intel Edison. You will also get an overview of cloud technologies s...
SYS-CON Events announced today that SafeLogic has been named “Bag Sponsor” of SYS-CON's 16th International Cloud Expo® New York, which will take place June 9-11, 2015, at the Javits Center in New York City, NY. SafeLogic provides security products for applications in mobile and server/appliance environments. SafeLogic’s flagship product CryptoComply is a FIPS 140-2 validated cryptographic engine designed to secure data on servers, workstations, appliances, mobile devices, and in the Cloud.
Containers and microservices have become topics of intense interest throughout the cloud developer and enterprise IT communities. Accordingly, attendees at the upcoming 16th Cloud Expo at the Javits Center in New York June 9-11 will find fresh new content in a new track called PaaS | Containers & Microservices Containers are not being considered for the first time by the cloud community, but a current era of re-consideration has pushed them to the top of the cloud agenda. With the launch of Docker's initial release in March of 2013, interest was revved up several notches. Then late last...
SOA Software has changed its name to Akana. With roots in Web Services and SOA Governance, Akana has established itself as a leader in API Management and is expanding into cloud integration as an alternative to the traditional heavyweight enterprise service bus (ESB). The company recently announced that it achieved more than 90% year-over-year growth. As Akana, the company now addresses the evolution and diversification of SOA, unifying security, management, and DevOps across SOA, APIs, microservices, and more.
After making a doctor’s appointment via your mobile device, you receive a calendar invite. The day of your appointment, you get a reminder with the doctor’s location and contact information. As you enter the doctor’s exam room, the medical team is equipped with the latest tablet containing your medical history – he or she makes real time updates to your medical file. At the end of your visit, you receive an electronic prescription to your preferred pharmacy and can schedule your next appointment.