| By Vishal Goenka | Article Rating: |
|
| November 1, 2002 12:00 AM EST | Reads: |
28,055 |
There are several textbooks and Internet articles that dwell on the performance and scalability benefits of using a thread pool versus creating new threads in a multithreaded Java application.
While some of them overstate the benefits, most fail to emphasize some of the caveats of Java thread pooling. Due to space contraints, this article provides only a brief summary of the benefits and emphasizes the drawbacks. A list of references that covers the benefits in more detail is provided at the end.
What Is Thread Pooling?
Thread pooling refers to a technique where a pool of worker threads is
created and managed by the application. When a new job arrives, instead of
creating a new thread to service it, it's queued by the thread-pool manager
and dispatched later to one of the available worker threads. The thread-pool
manager manages the number of active worker threads based on available
resources as well as load considerations, adding new threads to the pool or
freeing some worker threads in response to the number of outstanding
requests. The primary goals of thread pooling are managing the number of
active threads in the system and reducing the overhead of creating new
threads by reusing threads from a pool.
Why Pool Threads?
The primary argument in favor of managing the number of active threads
in the system is: threads have a memory overhead since each one needs a
certain amount of memory for its stack. Threads also add scheduling
overhead, since the scheduler's work increases as the number of threads
increases. Depending on the implementation of the Java Virtual Machine, each
Java thread on certain operating systems may correspond to an OS thread,
making Java threads extremely heavyweight, and may limit the total number of
active threads that the JVM is allowed to create.
To be clear, I'm not saying you don't need to manage the number of active threads in a system. After all, the benefits of multithreading do have diminishing returns once the number of threads contending for the available CPUs increases. If a server can process only about 1,000 simultaneous requests, it doesn't help to dispatch each incoming request as it's made. Often the requests must be queued and processed at a controlled rate to maintain the number of active requests below the server threshold. A common mistake, however, is to assume that dispatching queued requests automatically calls for the reuse of threads from a thread pool. Dispatching a request to a new thread and letting the thread die once the request is serviced achieves the same effect on managing the number of active threads in the system.
Thread creation also has an overhead that can be higher in many cases than the overhead of managing a thread pool. While the argument still applies, the relative performance impact has changed significantly over the years. The newer JVM implementations are optimized for creating threads; most use a combination of user-level threads (known as green threads) as well as system-level threads (or OS threads) to make creating threads much less expensive than in earlier implementations.
The Dichotomy of Pooling Threads
The reasons for pooling threads seem to make perfect sense, just as
connection pooling makes perfect sense in a server-side application. Used
inappropriately, thread pooling in Java can introduce serious programming
flaws, ranging from logic errors to potential deadlocks and even performance
bottlenecks.
I distinguish thread pooling in general from thread pooling in Java simply because many of the arguments that apply to thread pooling in Java do not apply to other programming environments. Perhaps a common source of misconception about the benefits of thread pooling in Java stems from our experiences in other environments where the cost-benefit equation tilts strongly in favor of thread pooling. In the following discussion, "thread pooling" implies "thread pooling in Java," unless stated otherwise.
Thread Pooling Breaks Usage of Thread-Local Variables
Thread pooling is not friendly to the java.lang.ThreadLocal and
java.lang.InheritableThreadLocal classes that were introduced in JDK 1.2 to provide
thread-local variables. These variables differ from other variables in that
each thread has its own independently initialized copy of the variable. The
typical usage of a thread-local variable in a multithreaded application is
to keep track of some application context associated with the request, such
as the identity of the user making the request. The get() and set() methods
in the ThreadLocal class return and set the value that corresponds to the
executing thread. Thus, each thread executing a get() on a given ThreadLocal
variable can potentially get a different object. The set() similarly allows
each executing thread to set a different value for the same ThreadLocal
variable.
Think of a ThreadLocal variable as a hashmap that stores one value per thread by using the thread as a key into the hashmap; however, these values are "associated" with the thread in a stronger and more intrusive way. Each thread maintains a reference to a private version of a hashmap (implemented as a package accessible class, ThreadLocalMap) that contains all the thread-local variables associated with that thread. Each thread uses the declared ThreadLocal variable as the key into the hashmap to store one value per ThreadLocal variable. When a thread dies and is garbage collected, all thread-local values referenced by it are subject to garbage collection (unless they're referenced elsewhere).
InheritableThreadLocal extends ThreadLocal to allow thread-local variables associated with a parent thread to be inherited by any new child thread created by the parent thread. This class is designed to replace the ThreadLocal in those cases where a per-thread attri- bute being maintained by the variable, such as UserId, TransactionId, etc., must be automatically transmitted to any child threads that are created. To achieve the inheritance, the Thread class maintains a separate private hashmap (ThreadLocalMap) for inheritable thread-local variables. The Thread constructor ensures that the inheritable thread-local variables of the executing thread (the parent thread) are copied onto itself (the child thread).
Thus, each Thread object has explicit references to all the thread-local variables, which in turn are only accessible via the ThreadLocal or InheritableThreadLocal object. Like normal variables, private ThreadLocal or InheritableThreadLocal variables are only accessible to the declaring class and the threads associated with them. While it's possible to expose a method in the Thread class to "purge" all (inheritable) thread-local variables associated with the thread, it would require additional security checks to ensure that only privileged code can do so, the privilege being ascertained using the Java permission mechanism. Given the lack of such a construct even in the latest versions of the J2SE/J2EE APIs, there's no way for a thread-pool manager to purge or reset all the thread-local variables associated with a given thread when reusing the thread in a different request context without the explicit cooperation of all code that uses any thread-local variables.
Unless the declaring code "removes" a value assignment by explicitly setting the value to null, thread-local variables remain assigned and hence "associated" with the thread. As a result, any code that uses thread locals risks using stale/incorrect values of the variables that were created in an earlier request context when running in a pooled thread. Given that ThreadLocal and InheritableThreadLocal are standard J2SE/J2EE classes, they're quite likely being used in various pieces of library code, none of which is safe to be executed by a pooled thread without an explicit understanding of the usage details.
The only way to get around this is to avoid using a pooled thread to execute code you don't know and control its implementation details. An application that uses a thread pool to dispatch requests made in different contexts is likely to have "inconsistent" logical errors when executing a piece of code while servicing a request that uses a thread-local variable.
Lack of a Standard Thread-Pooling Library
There are several reference/example implementations of a thread pool
manager in various texts that describe and prescribe them, but most
developers will choose to implement their own since these reference
implementations are meant only for illustration and therefore are not
product quality, are copyrighted, nonstandard, and often won't meet your
specific requirements. Implementing a robust thread-pool library is a
complex task that requires extensive tests in a variety of situations,
including different operating systems, multiprocessor machines, extensive
load testing, various application usage scenarios, and thread-pool
management policies. While it seems simple on the surface, a robust
implementation must address such issues as pool-size determination based on
execution environment and application usage, request throttle, job
scheduling, and perhaps even priority scheduling.
When using a new thread per request, the JVM's scheduler ensures that every runnable thread gets a fair share of the CPU, even if the share happens to be really small, as in the case where there are simply too many threads for the given execution environment. Using a size-bounded thread pool can cause queued requests to be starved. If one of the queued requests happens to be a producer (in a typical producer-consumer paradigm), it can lead to a deadlock if all the dispatched requests happen to be consumers waiting for the producer. Such application dependencies may necessitate knowledge of the application logic in the thread-pool dispatching decision, requiring some kind of priority dispatching construct. Priority-based dispatching opens up another can of worms, exemplified by the Mars Pathfinder "reset" problem caused by overlooking the classic priority-inversion problem.
Addressing all the design issues that a robust thread-pool library must implement is a nontrivial task. This happens to be one area of the system that can have systemic effects and bring your application to a grinding halt, unless tested for all potential race conditions and deadlocks, especially since the memory model in multiprocessor systems is often nonintuitive. This is no reflection of your abilities as a programmer, rather a statement about the inherent complexity of the problem and the effort involved in getting a robust implementation.
Performance Benefit Myths of Thread Pooling
While the lack of a standard implementation of a thread-pool library
seems like a lame excuse not to use one, it's worth asking why even the
latest versions of J2SE and J2EE don't provide one if using a thread pool is
so critical to performance on server-side applications. The answer lies in
understanding the details of the Java threads implementation. As mentioned
earlier, newer JVMs are optimized for thread creation and destruction and
use a combination of user- and system-level threads to minimize the
overhead. Not that there aren't any potential benefits in using thread
pools, but these are insignificant unless the jobs to be run by pooled
threads are short and quick and have a runtime overhead that's comparable to
the overhead of thread creation and destruction. Determining the relative
overhead of thread creation for the job in question and comparing it with
the overhead of thread-pool management must be backed up with real tests in
load conditions. As with many performance-related exercises, the results
often defy common sense.
To Pool or Not to Pool
Before deciding that you need a thread pool for your application because
that little timer thread you need to start for every request seems too much
of an overhead, or deciding that you can churn out a thread-pool library for
your particular usage in a day or so, here are a few things to consider.
How critical is the performance of that portion of the application and would you make the same decision if it turned out that you needed over a month to write a robust thread-pool library? Is it acceptable to risk an application deadlock due to a less-than-robust thread pool implemented in a few days? Do you have the time to validate and perhaps quantify the savings achieved when using a pooled thread versus creating a new thread? Do you have the time to validate correct behavior under heavy load on a multiprocessor machine, particularly when the boundary conditions on pool size are exercised? If you're not sure about the implementation details of some code, such as usage of thread-local variables, will the pooled thread run it?
In my own experience, a quick and dirty thread-pool implementation of the job at hand often comes back to bite you. A small perceived performance gain is probably not worth the risks introduced by a less-than-robust thread-pool implementation. Not that these concerns don't apply to other design decisions, but thread pooling falls in the category in which the risks are much higher and the benefits are often much lower than perceived.
Summary
Top reasons for pooling threads:
- Limiting the number of active threads in the system
- Performance benefits of reducing thread creation overhead
- Breaks usage of java.lang.ThreadLocal and java.lang.InheritedThreadLocal objects
- Lack of a standard and time-tested thread-pool library
- The myths of thread pooling performance benefits
Published November 1, 2002 Reads 28,055
Copyright © 2002 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Vishal Goenka
Vishal Goenka is a system architect for the core platform components at Campus Pipeline. He holds a BS in computer science from the Indian Institute of Technology, Kanpur (India).
![]() |
David Levy 12/12/06 10:26:55 PM EST | |||
Whilst this is a good explanation of Thread Pooling, it is yet another article which confuses the ThreadLocal issue. It is completely safe to use ThreadLocal in a managed thread environment provided: For example, the following method interceptor is completely acceptable to use thread local: try return method.proceed(); ThreadLocal does have conditions on it's usage - just like any feature, but once those conditions are observed there is no reason not to use this as a solution. |
||||
![]() |
Tejo 06/07/03 05:19:00 PM EDT | |||
Yes I agree with Vishal's views. |
||||
![]() |
VishalGoenka 11/05/02 10:46:00 AM EST | |||
Juan, I'm well aware of Paul Hyde's book and have listed it as a reference. It does tell you when to use Thread Pools (like most other texts do), but does not tell you the risks, which is the point of this article. Regards, |
||||
![]() |
Juan Valdez 11/05/02 09:00:00 AM EST | |||
Mr Paul Hyde wrote the book entitled Nothing much new in this article. |
||||
![]() |
Jens Schumann 11/04/02 08:13:00 PM EST | |||
Just read the article briefly. I do agree that there is a need for good thread pool implementations - and it is already there. See optional link. You will find there JSR166 mentioned too. See jcp.org for further details. Jens |
||||
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Kindle 2 vs Nook
- Why IBM’s Server Chief Got Busted
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing Journal Opens "Readers' Choice Awards" Nominations
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Industry Experts Discuss the State of Cloud Computing
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- It's the Java vs. C++ Shootout Revisited!
- The End of IT 1.0 As We Know It Has Begun
- An Introduction to Abbot
- Java Kicks Ruby on Rails in the Butt
- Interviewing Java Developers With Tears in My Eyes
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- 1st Annual Government IT Expo: Call for Papers Deadline July 15
- How to Diagnose Java Resource Starvation
- REA Is Where RIA Becomes the Norm
- Kindle 2 vs Nook
- Anatomy of a Java Finalizer
- Why IBM’s Server Chief Got Busted
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?




































