|By Patrick Yeh||
|April 12, 2007 03:00 PM EDT||
Thread pooling is a common technique that modern application servers adopted to run Java applications efficiently. Even application servers not implemented by Java share the concept of using system resources more compactly to maximize overall throughput. Besides the underlying programming mystery of native OS threads, a Java thread object encapsulates some hurdles to easy-to-use and flexible synchronization at the programming level. JDK 5.0 has built-in thread pooling classes in its 'java.util.concurrent' package to facilitate programming the thread pool quickly. If we're using a J2EE application server, the container inherently enforces thread synchronization from its runtime nature. That means we don't have to fight difficult threading issues day and night, but it doesn't mean we can dismiss them. Instead, we should attend to the thread issues inside the code and the architecture. If we don't, system performance will degrade. A once well-running system will gradually become slower and slower, then application throughput will be blocked and external requests start to queue up. There's some degree of denial of service. In most commercial production environments like telecom, e-commerce and banking, this situation impacts the business and can create unplanned system outages.
While the server operator calls for help, an experienced engineer often asks something outstanding of the application environment. During the incident, we may see either ultra-high or ultra-low CPU usage at the OS level along with applications hanging and threads sticking at the JVM level from a three-dimensional viewpoint. How does one disclose the bottleneck and abnormality at the JVM level? The answer is: When the problem is reproducible then a commercial productive profiling tool or remotely debugging the JVM is an option. But taking copies of the thread dumps is widely used because it's straightforward and instantaneous. And it involves the least overhead.
Thread dumps provide a snapshot of the JVM internals at a special point at a minimal cost. We may give the JVM hosting the applications a signal SIG-QUIT with the JVM process ID (PID) on a Unix-like system (e.g., kill -3 xxxxxx; where 'xxxxxx' is the JVM PID) or have a control-break on the Windows Java console ask the JVM to output its thread information in detail to a standard output when the JVM didn't start in company with a '-Xrs' option before. Due to the importance of the thread dump, it's best to redirect the standard output to a file or pipe the information to a utility that can store and rotate the standard output to log files (see Figure1).
A JVM has a complementary function that enables it to get the thread dump at the undocumented C API level. (We can look at the Java source code that Sun released recently under the GPL to see this feature.) We may utilize this API for a simple debugging framework to address many common issues inside the application. But it requires a JNI implementation in C because there's no pure Java API to force the JVM to generate the thread dump, though we may get similar thread stack traces in JDK 5.0 via the 'getAllStackTraces()' API. Despite this tricky function, we're interested in a snapshot of the thread dump while we have identified the stuck threads (see Figure 2).
With copies of the thread dump collected at intervals of seconds, we may identify the stuck threads from the running state of each thread in the thread pool. Fortunately, some application servers do an automatic health check on the application thread pools. In fact, it acts like a watchdog that periodically check the last running statistics on the threads in the thread pools. Once the threads have run for fixed long-running seconds, it will print out the execution information on the stuck threads either in standard output or log files. Second, some platform JDK vendors have out their diagnostic utilities in the public domain to aid us in detecting stuck threads (e.g., HP's JMeter and IBM's thread analyzer). However, once we isolate the stuck threads, we'll have to figure out why they got stuck from the information (i.e., the stack traces of these stuck threads) about what they were doing when they got stuck. This way we can improve code quality and tier architecture in the next iteration (see Figure 3).
A stuck thread means a thread is blocked and can't return to the thread pool smoothly in a given period of time. When an application thread is blocked unintentionally, it means it can't quickly complete its dispatch and be reused. In most of production situations, the root cause of these stuck threads is also the root cause of bad system performance because it interferes with regular task execution. [It's also a performance issue for producers and healthy consumers. < 1 ] (request frequency) < (healthy thread count for request execution/average measured request execution time per healthy thread.]
Blocking without specifying a network connect or read timeout is the most frequent reason we have seen. When we don't manually configure a timeout for each method call involving networking, it will have a potential blocking behavior by the underlying physical socket read/connect characteristic. While waiting infinitely for the response from the other side, the native OS networking layer probably throws an I/O exception. By default this behavior takes an unexpectedly long time (e.g., 240 seconds). Modern distributed systems need to factor in this situation (especially, Web Services invocations). Though we may set timeouts for well-known protocols via some system properties (e.g., sun.net.client.defaultConnectTimeout and sun.net.client.defaultReadTimeout), the newer version of JDK might provide a generic mechanism to explicitly configure each default timeout value for those whose methods call socket connect/read as a security policy file. For example, com.sun.jndi.ldap.read.timeout (http://java.sun.com/docs/books/tutorial/jndi/newstuff/readtimeout.html) wasn't available prior to JDK 6.0 for LDAP service provider read timeout. Otherwise, when the problematic code isn't under the control of end users, it usually needs to restart the application to temporarily reset the abnormal phenomenon propagated from the other side. In addition, we should take into account whether the service we called is idempotent while analyzing this kind of issue in the design phase because we don't know whether the service at the other end keeps executing when the thread has ended its invocation after a timeout (see Figure 4).
The unexpectedly long execution time of a SQL statement is a common condition that causes a stuck thread. In the thread dump we collected, we can see that the stuck thread was running a network socket read for a long time without changes and the thread's stack trace contains many JDBC driver classes. Under these conditions, we can also check the status of the database it connected with and set the query timeout for all application code using a JDBC statement setQueryTimeout method. (Most JDBC drivers support this feature but we'd have to read the JDBC driver's release note first.) According to the different nature of every SQL query, it would be better to segregate the programs that have a longer execution time in another thread pool and tune the database table with indices for faster access. We would also need to check whether the JDBC driver is certified with the connected database. A sub-issue is the accessed table locked by other processes so the threads for the JDBC query couldn't continue because of table locking.
Resource contention is an issue that's hard to find if we don't get the entire thread dump to analyze. Basically, it's an issue of producers and consumers. Any limited resources on the system (JDBC connections, socket connections, etc.) will impact this issue. The best thing to do is look at the thread dump, get the stuck thread name from the log, and find the bottleneck that's causing the stuck thread.
File descriptor leaking is an issue that causes this phenomenon (Note that a Unix socket implementation requires a file descriptor). So the JVM should have enough file descriptor numbers to host our applications. Generally, we can adjust the open file limit with the Unix shell 'ulimit' command for the current shell. And we can list the open files with the public domain 'lsof' tool. It's intensely interesting that many developers don't explicitly use the 'close()' method in the final block when an object inherently provides a 'close()' method and want JVM to release these unclosed objects when garbage is collected. We should keep firmly in mind that that act is bad without closing the system resource after use. A special case is when the socket connections in the application don't close properly while still being underdeployed and then the application begins to throw an IOException with a 'Too many open files' message after repeated application redeployment.
|Patrick 08/03/07 04:21:08 AM EDT|
|Omar 04/09/07 04:10:36 PM EDT|
First of all, excellent article!! Very informative and practical.
You make reference of a utility to monitor stack threads. Where can I download this utility? There seems to be a .jar file an a shared library.
Thanking you in advance,
Fortunately, meaningful and tangible business cases for IoT are plentiful in a broad array of industries and vertical markets. These range from simple warranty cost reduction for capital intensive assets, to minimizing downtime for vital business tools, to creating feedback loops improving product design, to improving and enhancing enterprise customer experiences. All of these business cases, which will be briefly explored in this session, hinge on cost effectively extracting relevant data from ...
Feb. 9, 2016 03:15 PM EST
SYS-CON Events announced today that Men & Mice, the leading global provider of DNS, DHCP and IP address management overlay solutions, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. The Men & Mice Suite overlay solution is already known for its powerful application in heterogeneous operating environments, enabling enterprises to scale without fuss. Building on a solid range of diverse platform support,...
Feb. 9, 2016 03:00 PM EST Reads: 180
As enterprises work to take advantage of Big Data technologies, they frequently become distracted by product-level decisions. In most new Big Data builds this approach is completely counter-productive: it presupposes tools that may not be a fit for development teams, forces IT to take on the burden of evaluating and maintaining unfamiliar technology, and represents a major up-front expense. In his session at @BigDataExpo at @ThingsExpo, Andrew Warfield, CTO and Co-Founder of Coho Data, will dis...
Feb. 9, 2016 03:00 PM EST Reads: 161
SYS-CON Events announced today that iDevices®, the preeminent brand in the connected home industry, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. iDevices, the preeminent brand in the connected home industry, has a growing line of HomeKit-enabled products available at the largest retailers worldwide. Through the “Designed with iDevices” co-development program and its custom-built IoT Cloud Infrastruc...
Feb. 9, 2016 02:45 PM EST
SYS-CON Events announced today that Pythian, a global IT services company specializing in helping companies adopt disruptive technologies to optimize revenue-generating systems, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Founded in 1997, Pythian is a global IT services company that helps companies compete by adopting disruptive technologies such as cloud, Big Data, advanced analytics, and DevO...
Feb. 9, 2016 02:45 PM EST Reads: 187
SYS-CON Events announced today that Alert Logic, Inc., the leading provider of Security-as-a-Service solutions for the cloud, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Alert Logic, Inc., provides Security-as-a-Service for on-premises, cloud, and hybrid infrastructures, delivering deep security insight and continuous protection for customers at a lower cost than traditional security solutions. Ful...
Feb. 9, 2016 02:15 PM EST Reads: 388
SYS-CON Events announced today that Interoute, owner-operator of one of Europe's largest networks and a global cloud services platform, has been named “Bronze Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. Interoute is the owner-operator of one of Europe's largest networks and a global cloud services platform which encompasses 12 data centers, 14 virtual data centers and 31 colocation centers, with connections to 195 ad...
Feb. 9, 2016 01:15 PM EST Reads: 368
With an estimated 50 billion devices connected to the Internet by 2020, several industries will begin to expand their capabilities for retaining end point data at the edge to better utilize the range of data types and sheer volume of M2M data generated by the Internet of Things. In his session at @ThingsExpo, Don DeLoach, CEO and President of Infobright, will discuss the infrastructures businesses will need to implement to handle this explosion of data by providing specific use cases for filte...
Feb. 9, 2016 12:00 PM EST Reads: 144
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
Feb. 9, 2016 11:30 AM EST Reads: 399
Eighty percent of a data scientist’s time is spent gathering and cleaning up data, and 80% of all data is unstructured and almost never analyzed. Cognitive computing, in combination with Big Data, is changing the equation by creating data reservoirs and using natural language processing to enable analysis of unstructured data sources. This is impacting every aspect of the analytics profession from how data is mined (and by whom) to how it is delivered. This is not some futuristic vision: it's ha...
Feb. 9, 2016 08:45 AM EST Reads: 427
With the Apple Watch making its way onto wrists all over the world, it’s only a matter of time before it becomes a staple in the workplace. In fact, Forrester reported that 68 percent of technology and business decision-makers characterize wearables as a top priority for 2015. Recognizing their business value early on, FinancialForce.com was the first to bring ERP to wearables, helping streamline communication across front and back office functions. In his session at @ThingsExpo, Kevin Roberts...
Feb. 9, 2016 08:00 AM EST Reads: 363
One of the bewildering things about DevOps is integrating the massive toolchain including the dozens of new tools that seem to crop up every year. Part of DevOps is Continuous Delivery and having a complex toolchain can add additional integration and setup to your developer environment. In his session at @DevOpsSummit at 18th Cloud Expo, Miko Matsumura, Chief Marketing Officer of Gradle Inc., will discuss which tools to use in a developer stack, how to provision the toolchain to minimize onboa...
Feb. 9, 2016 07:45 AM EST
Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & ...
Feb. 9, 2016 07:15 AM EST Reads: 218
The cloud promises new levels of agility and cost-savings for Big Data, data warehousing and analytics. But it’s challenging to understand all the options – from IaaS and PaaS to newer services like HaaS (Hadoop as a Service) and BDaaS (Big Data as a Service). In her session at @BigDataExpo at @ThingsExpo, Hannah Smalltree, a director at Cazena, will provide an educational overview of emerging “as-a-service” options for Big Data in the cloud. This is critical background for IT and data profes...
Feb. 9, 2016 06:45 AM EST Reads: 171
Silver Spring Networks, Inc. (NYSE: SSNI) extended its Internet of Things technology platform with performance enhancements to Gen5 – its fifth generation critical infrastructure networking platform. Already delivering nearly 23 million devices on five continents as one of the leading networking providers in the market, Silver Spring announced it is doubling the maximum speed of its Gen5 network to up to 2.4 Mbps, increasing computational performance by 10x, supporting simultaneous mesh communic...
Feb. 8, 2016 05:00 PM EST
SYS-CON Events announced today that VAI, a leading ERP software provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. VAI (Vormittag Associates, Inc.) is a leading independent mid-market ERP software developer renowned for its flexible solutions and ability to automate critical business functions for the distribution, manufacturing, specialty retail and service sectors. An IBM Premier Business Part...
Feb. 8, 2016 03:00 PM EST Reads: 584
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry's single source for the cloud. Fusion's advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including clou...
Feb. 6, 2016 03:30 PM EST Reads: 745
Most people haven’t heard the word, “gamification,” even though they probably, and perhaps unwittingly, participate in it every day. Gamification is “the process of adding games or game-like elements to something (as a task) so as to encourage participation.” Further, gamification is about bringing game mechanics – rules, constructs, processes, and methods – into the real world in an effort to engage people. In his session at @ThingsExpo, Robert Endo, owner and engagement manager of Intrepid D...
Feb. 5, 2016 09:00 PM EST Reads: 804
WebRTC has had a real tough three or four years, and so have those working with it. Only a few short years ago, the development world were excited about WebRTC and proclaiming how awesome it was. You might have played with the technology a couple of years ago, only to find the extra infrastructure requirements were painful to implement and poorly documented. This probably left a bitter taste in your mouth, especially when things went wrong.
Feb. 2, 2016 04:30 AM EST Reads: 871
Learn how IoT, cloud, social networks and last but not least, humans, can be integrated into a seamless integration of cooperative organisms both cybernetic and biological. This has been enabled by recent advances in IoT device capabilities, messaging frameworks, presence and collaboration services, where devices can share information and make independent and human assisted decisions based upon social status from other entities. In his session at @ThingsExpo, Michael Heydt, founder of Seamless...
Feb. 1, 2016 05:00 AM EST Reads: 958