|By Patrick Yeh||
|April 12, 2007 03:00 PM EDT||
Thread pooling is a common technique that modern application servers adopted to run Java applications efficiently. Even application servers not implemented by Java share the concept of using system resources more compactly to maximize overall throughput. Besides the underlying programming mystery of native OS threads, a Java thread object encapsulates some hurdles to easy-to-use and flexible synchronization at the programming level. JDK 5.0 has built-in thread pooling classes in its 'java.util.concurrent' package to facilitate programming the thread pool quickly. If we're using a J2EE application server, the container inherently enforces thread synchronization from its runtime nature. That means we don't have to fight difficult threading issues day and night, but it doesn't mean we can dismiss them. Instead, we should attend to the thread issues inside the code and the architecture. If we don't, system performance will degrade. A once well-running system will gradually become slower and slower, then application throughput will be blocked and external requests start to queue up. There's some degree of denial of service. In most commercial production environments like telecom, e-commerce and banking, this situation impacts the business and can create unplanned system outages.
While the server operator calls for help, an experienced engineer often asks something outstanding of the application environment. During the incident, we may see either ultra-high or ultra-low CPU usage at the OS level along with applications hanging and threads sticking at the JVM level from a three-dimensional viewpoint. How does one disclose the bottleneck and abnormality at the JVM level? The answer is: When the problem is reproducible then a commercial productive profiling tool or remotely debugging the JVM is an option. But taking copies of the thread dumps is widely used because it's straightforward and instantaneous. And it involves the least overhead.
Thread dumps provide a snapshot of the JVM internals at a special point at a minimal cost. We may give the JVM hosting the applications a signal SIG-QUIT with the JVM process ID (PID) on a Unix-like system (e.g., kill -3 xxxxxx; where 'xxxxxx' is the JVM PID) or have a control-break on the Windows Java console ask the JVM to output its thread information in detail to a standard output when the JVM didn't start in company with a '-Xrs' option before. Due to the importance of the thread dump, it's best to redirect the standard output to a file or pipe the information to a utility that can store and rotate the standard output to log files (see Figure1).
A JVM has a complementary function that enables it to get the thread dump at the undocumented C API level. (We can look at the Java source code that Sun released recently under the GPL to see this feature.) We may utilize this API for a simple debugging framework to address many common issues inside the application. But it requires a JNI implementation in C because there's no pure Java API to force the JVM to generate the thread dump, though we may get similar thread stack traces in JDK 5.0 via the 'getAllStackTraces()' API. Despite this tricky function, we're interested in a snapshot of the thread dump while we have identified the stuck threads (see Figure 2).
With copies of the thread dump collected at intervals of seconds, we may identify the stuck threads from the running state of each thread in the thread pool. Fortunately, some application servers do an automatic health check on the application thread pools. In fact, it acts like a watchdog that periodically check the last running statistics on the threads in the thread pools. Once the threads have run for fixed long-running seconds, it will print out the execution information on the stuck threads either in standard output or log files. Second, some platform JDK vendors have out their diagnostic utilities in the public domain to aid us in detecting stuck threads (e.g., HP's JMeter and IBM's thread analyzer). However, once we isolate the stuck threads, we'll have to figure out why they got stuck from the information (i.e., the stack traces of these stuck threads) about what they were doing when they got stuck. This way we can improve code quality and tier architecture in the next iteration (see Figure 3).
A stuck thread means a thread is blocked and can't return to the thread pool smoothly in a given period of time. When an application thread is blocked unintentionally, it means it can't quickly complete its dispatch and be reused. In most of production situations, the root cause of these stuck threads is also the root cause of bad system performance because it interferes with regular task execution. [It's also a performance issue for producers and healthy consumers. < 1 ] (request frequency) < (healthy thread count for request execution/average measured request execution time per healthy thread.]
Blocking without specifying a network connect or read timeout is the most frequent reason we have seen. When we don't manually configure a timeout for each method call involving networking, it will have a potential blocking behavior by the underlying physical socket read/connect characteristic. While waiting infinitely for the response from the other side, the native OS networking layer probably throws an I/O exception. By default this behavior takes an unexpectedly long time (e.g., 240 seconds). Modern distributed systems need to factor in this situation (especially, Web Services invocations). Though we may set timeouts for well-known protocols via some system properties (e.g., sun.net.client.defaultConnectTimeout and sun.net.client.defaultReadTimeout), the newer version of JDK might provide a generic mechanism to explicitly configure each default timeout value for those whose methods call socket connect/read as a security policy file. For example, com.sun.jndi.ldap.read.timeout (http://java.sun.com/docs/books/tutorial/jndi/newstuff/readtimeout.html) wasn't available prior to JDK 6.0 for LDAP service provider read timeout. Otherwise, when the problematic code isn't under the control of end users, it usually needs to restart the application to temporarily reset the abnormal phenomenon propagated from the other side. In addition, we should take into account whether the service we called is idempotent while analyzing this kind of issue in the design phase because we don't know whether the service at the other end keeps executing when the thread has ended its invocation after a timeout (see Figure 4).
The unexpectedly long execution time of a SQL statement is a common condition that causes a stuck thread. In the thread dump we collected, we can see that the stuck thread was running a network socket read for a long time without changes and the thread's stack trace contains many JDBC driver classes. Under these conditions, we can also check the status of the database it connected with and set the query timeout for all application code using a JDBC statement setQueryTimeout method. (Most JDBC drivers support this feature but we'd have to read the JDBC driver's release note first.) According to the different nature of every SQL query, it would be better to segregate the programs that have a longer execution time in another thread pool and tune the database table with indices for faster access. We would also need to check whether the JDBC driver is certified with the connected database. A sub-issue is the accessed table locked by other processes so the threads for the JDBC query couldn't continue because of table locking.
Resource contention is an issue that's hard to find if we don't get the entire thread dump to analyze. Basically, it's an issue of producers and consumers. Any limited resources on the system (JDBC connections, socket connections, etc.) will impact this issue. The best thing to do is look at the thread dump, get the stuck thread name from the log, and find the bottleneck that's causing the stuck thread.
File descriptor leaking is an issue that causes this phenomenon (Note that a Unix socket implementation requires a file descriptor). So the JVM should have enough file descriptor numbers to host our applications. Generally, we can adjust the open file limit with the Unix shell 'ulimit' command for the current shell. And we can list the open files with the public domain 'lsof' tool. It's intensely interesting that many developers don't explicitly use the 'close()' method in the final block when an object inherently provides a 'close()' method and want JVM to release these unclosed objects when garbage is collected. We should keep firmly in mind that that act is bad without closing the system resource after use. A special case is when the socket connections in the application don't close properly while still being underdeployed and then the application begins to throw an IOException with a 'Too many open files' message after repeated application redeployment.
|Patrick 08/03/07 04:21:08 AM EDT|
|Omar 04/09/07 04:10:36 PM EDT|
First of all, excellent article!! Very informative and practical.
You make reference of a utility to monitor stack threads. Where can I download this utility? There seems to be a .jar file an a shared library.
Thanking you in advance,
Too often with compelling new technologies market participants become overly enamored with that attractiveness of the technology and neglect underlying business drivers. This tendency, what some call the “newest shiny object syndrome,” is understandable given that virtually all of us are heavily engaged in technology. But it is also mistaken. Without concrete business cases driving its deployment, IoT, like many other technologies before it, will fade into obscurity.
Oct. 9, 2015 07:45 PM EDT Reads: 147
Today air travel is a minefield of delays, hassles and customer disappointment. Airlines struggle to revitalize the experience. GE and M2Mi will demonstrate practical examples of how IoT solutions are helping airlines bring back personalization, reduce trip time and improve reliability. In their session at @ThingsExpo, Shyam Varan Nath, Principal Architect with GE, and Dr. Sarah Cooper, M2Mi's VP Business Development and Engineering, will explore the IoT cloud-based platform technologies driving this change including privacy controls, data transparency and integration of real time context w...
Oct. 9, 2015 07:30 PM EDT Reads: 124
Electric power utilities face relentless pressure on their financial performance, and reducing distribution grid losses is one of the last untapped opportunities to meet their business goals. Combining IoT-enabled sensors and cloud-based data analytics, utilities now are able to find, quantify and reduce losses faster – and with a smaller IT footprint. Solutions exist using Internet-enabled sensors deployed temporarily at strategic locations within the distribution grid to measure actual line loads.
Oct. 9, 2015 06:30 PM EDT Reads: 102
The buzz continues for cloud, data analytics and the Internet of Things (IoT) and their collective impact across all industries. But a new conversation is emerging - how do companies use industry disruption and technology enablers to lead in markets undergoing change, uncertainty and ambiguity? Organizations of all sizes need to evolve and transform, often under massive pressure, as industry lines blur and merge and traditional business models are assaulted and turned upside down. In this new data-driven world, marketplaces reign supreme while interoperability, APIs and applications deliver un...
Oct. 9, 2015 06:00 PM EDT Reads: 313
The Internet of Everything is re-shaping technology trends–moving away from “request/response” architecture to an “always-on” Streaming Web where data is in constant motion and secure, reliable communication is an absolute necessity. As more and more THINGS go online, the challenges that developers will need to address will only increase exponentially. In his session at @ThingsExpo, Todd Greene, Founder & CEO of PubNub, will explore the current state of IoT connectivity and review key trends and technology requirements that will drive the Internet of Things from hype to reality.
Oct. 9, 2015 05:30 PM EDT Reads: 109
The Internet of Things (IoT) is growing rapidly by extending current technologies, products and networks. By 2020, Cisco estimates there will be 50 billion connected devices. Gartner has forecast revenues of over $300 billion, just to IoT suppliers. Now is the time to figure out how you’ll make money – not just create innovative products. With hundreds of new products and companies jumping into the IoT fray every month, there’s no shortage of innovation. Despite this, McKinsey/VisionMobile data shows "less than 10 percent of IoT developers are making enough to support a reasonably sized team....
Oct. 9, 2015 04:00 PM EDT Reads: 245
You have your devices and your data, but what about the rest of your Internet of Things story? Two popular classes of technologies that nicely handle the Big Data analytics for Internet of Things are Apache Hadoop and NoSQL. Hadoop is designed for parallelizing analytical work across many servers and is ideal for the massive data volumes you create with IoT devices. NoSQL databases such as Apache HBase are ideal for storing and retrieving IoT data as “time series data.”
Oct. 9, 2015 03:45 PM EDT Reads: 512
The IoT market is on track to hit $7.1 trillion in 2020. The reality is that only a handful of companies are ready for this massive demand. There are a lot of barriers, paint points, traps, and hidden roadblocks. How can we deal with these issues and challenges? The paradigm has changed. Old-style ad-hoc trial-and-error ways will certainly lead you to the dead end. What is mandatory is an overarching and adaptive approach to effectively handle the rapid changes and exponential growth.
Oct. 9, 2015 03:00 PM EDT Reads: 214
Today’s connected world is moving from devices towards things, what this means is that by using increasingly low cost sensors embedded in devices we can create many new use cases. These span across use cases in cities, vehicles, home, offices, factories, retail environments, worksites, health, logistics, and health. These use cases rely on ubiquitous connectivity and generate massive amounts of data at scale. These technologies enable new business opportunities, ways to optimize and automate, along with new ways to engage with users.
Oct. 9, 2015 02:00 PM EDT Reads: 194
The IoT is upon us, but today’s databases, built on 30-year-old math, require multiple platforms to create a single solution. Data demands of the IoT require Big Data systems that can handle ingest, transactions and analytics concurrently adapting to varied situations as they occur, with speed at scale. In his session at @ThingsExpo, Chad Jones, chief strategy officer at Deep Information Sciences, will look differently at IoT data so enterprises can fully leverage their IoT potential. He’ll share tips on how to speed up business initiatives, harness Big Data and remain one step ahead by apply...
Oct. 9, 2015 01:45 PM EDT Reads: 567
There will be 20 billion IoT devices connected to the Internet soon. What if we could control these devices with our voice, mind, or gestures? What if we could teach these devices how to talk to each other? What if these devices could learn how to interact with us (and each other) to make our lives better? What if Jarvis was real? How can I gain these super powers? In his session at 17th Cloud Expo, Chris Matthieu, co-founder and CTO of Octoblu, will show you!
Oct. 9, 2015 01:15 PM EDT
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the cloud and the best price/performance value available. ProfitBricks was named one of the coolest Clo...
Oct. 9, 2015 01:00 PM EDT Reads: 804
As a company adopts a DevOps approach to software development, what are key things that both the Dev and Ops side of the business must keep in mind to ensure effective continuous delivery? In his session at DevOps Summit, Mark Hydar, Head of DevOps, Ericsson TV Platforms, will share best practices and provide helpful tips for Ops teams to adopt an open line of communication with the development side of the house to ensure success between the two sides.
Oct. 9, 2015 01:00 PM EDT Reads: 619
SYS-CON Events announced today that IBM Cloud Data Services has been named “Bronze Sponsor” of SYS-CON's 17th Cloud Expo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. IBM Cloud Data Services offers a portfolio of integrated, best-of-breed cloud data services for developers focused on mobile computing and analytics use cases.
Oct. 9, 2015 12:00 PM EDT Reads: 743
SYS-CON Events announced today that Sandy Carter, IBM General Manager Cloud Ecosystem and Developers, and a Social Business Evangelist, will keynote at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA.
Oct. 9, 2015 11:15 AM EDT
Developing software for the Internet of Things (IoT) comes with its own set of challenges. Security, privacy, and unified standards are a few key issues. In addition, each IoT product is comprised of at least three separate application components: the software embedded in the device, the backend big-data service, and the mobile application for the end user's controls. Each component is developed by a different team, using different technologies and practices, and deployed to a different stack/target - this makes the integration of these separate pipelines and the coordination of software upd...
Oct. 9, 2015 09:00 AM EDT Reads: 300
Mobile messaging has been a popular communication channel for more than 20 years. Finnish engineer Matti Makkonen invented the idea for SMS (Short Message Service) in 1984, making his vision a reality on December 3, 1992 by sending the first message ("Happy Christmas") from a PC to a cell phone. Since then, the technology has evolved immensely, from both a technology standpoint, and in our everyday uses for it. Originally used for person-to-person (P2P) communication, i.e., Sally sends a text message to Betty – mobile messaging now offers tremendous value to businesses for customer and empl...
Oct. 9, 2015 08:30 AM EDT Reads: 315
"Matrix is an ambitious open standard and implementation that's set up to break down the fragmentation problems that exist in IP messaging and VoIP communication," explained John Woolf, Technical Evangelist at Matrix, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Oct. 9, 2015 07:00 AM EDT Reads: 5,891
WebRTC converts the entire network into a ubiquitous communications cloud thereby connecting anytime, anywhere through any point. In his session at WebRTC Summit,, Mark Castleman, EIR at Bell Labs and Head of Future X Labs, will discuss how the transformational nature of communications is achieved through the democratizing force of WebRTC. WebRTC is doing for voice what HTML did for web content.
Oct. 9, 2015 06:00 AM EDT Reads: 1,417
Nowadays, a large number of sensors and devices are connected to the network. Leading-edge IoT technologies integrate various types of sensor data to create a new value for several business decision scenarios. The transparent cloud is a model of a new IoT emergence service platform. Many service providers store and access various types of sensor data in order to create and find out new business values by integrating such data.
Oct. 9, 2015 04:00 AM EDT Reads: 579