|By Paul Bemowski||
|August 1, 2003 12:00 AM EDT||
In early 2002 Intel became the first chip manufacturer to release a processor incorporating a new technology known as Simultaneous Multithreading, or SMT. Intel's SMT implementation (dubbed Hyper-Threading or HT) has been available in their Xeon processor line for over a year, with little fanfare. In April 2003, Intel announced that HT technology will be added to its desktop-focused Pentium 4 line of processors. With HT enabled on one of these new systems, the BIOS will present a single processor to the operating system as two logical processors.
As Java developers, we should all be excited about this new feature of Intel processors. The java.lang.Thread object was one of the key factors driving Java to the strong position it enjoys in the server-side applications market. Both client and server applications written in Java often make heavy use of threads. Indeed even if an application does not use threads explicitly, all JVMs will use at least one background thread the garbage collector. SMT holds the promise of significantly increasing Java's server-side performance by more completely utilizing existing processor cycles in multithreaded applications.
This article attempts to explain the concepts of Simultaneous Multithreading in layman's terms, presents the development of an n-thread benchmarking suite, and uses that suite to produce concrete results of multithreaded benchmarks on HT and non-HT systems. We'll investigate various operation types to determine the factors that affect Java performance enhancements on Hyper-Threaded processors. Finally a series of conclusions and speculations are derived from the data collected.
Understanding Symmetric Multithreading on Intel Processors
Intel processors with HT technology carry two copies of the processor's architectural state on the same chip. This second architectural state stores a second thread context. Conceptually, this type of processor architecture splits each physical processor into two or more logical processors. Physical SMT processors present themselves to the operating system as separate logical processors. As we'll see later, it can then become important for the operating system to be aware of and to differentiate between logical and physical processors. Figure 1 illustrates the difference between SMT and non-SMT processors.
What is the benefit of SMT? As it turns out, the more expensive processor resources can find themselves underutilized while an active thread performs long latency operations. A cache miss, for instance, will require the processor to make a request to main memory. The majority of the processor's resources remain idle for this period of time; however, the processor presents itself to the operating system as busy. SMT systems use this slice of time to execute the operations of another on-chip thread context.
SMT processors contain an onboard scheduler to interleave multiple threads operating on the physical processor. If a thread encounters a long latency, the processor will immediately execute the instructions of the second on-chip processor state. For two threads accessing the same processor resources, the onboard scheduler will interleave the threads much the same as a software thread scheduler. This interleaving has a small amount of overhead, which can decrease the efficiency of the processor in certain situations. On an aggregate basis, however, processor performance is increased.
Using SMT it becomes apparent that depending on the work that each thread is doing on adjacent logical processors, we could see performance increases or decreases. Various papers (see references) studying multithreaded performance indicate generally positive results, with some research indicating perceived performance gains as high as 50%.
Intel Hyper-Threading requires support from three fundamental components of a system:
- The processor
- The chipset
- The operating system
Hyper-Threading was incorporated into the Xeon class processors in early 2002. Xeon is not to be confused with Pentium III Xeon. When Intel changed the Xeon's core to P4, it dropped the P4 designation, calling the processor simply Xeon. Recently, HT has found its way to the desktop P4 processor. Not all processors in each of these processor classes are capable of Hyper-Threading, however.
Table 1 indicates which processors support Hyper-Threading. The table also indicates factors that you can use to determine whether a given Intel processor supports HT.
With the release of the 3.06GHz Pentium 4, Intel changed the P4 logo, incorporating the letters H and T to indicate that it's a Hyper-Threading processor.
All recent Xeon processors support Hyper-Threading, but again, be sure to watch out for the 256KB L2 Cache version, which does not.
Chipset Support for HT
Not all chipsets support HT. Check with your chipset manufacturer to ensure that you can enable and disable HT support via the BIOS.
All HT chipsets interleave processor numbering to help less sophisticated thread schedulers make complete use of available physical processors. The chipset will present the logical processors to the OS as follows:
Logical CPU0 = Physical CPU0, Logical CPU0
Logical CPU1 = Physical CPU1, Logical CPU0
Logical CPU2 = Physical CPU0, Logical CPU1
Logical CPU3 = Physical CPU1, Logical CPU1
Operating Systems Supporting HT
Given a processor and chipset that support Hyper-Threading, the operating system must also be HT aware. Table 2 shows the OS support for several currently available operating systems commonly run on Intel-based hardware.
The Windows 2000 operating systems do not differentiate between logical and physical processors. Therefore a 32-processor HT system will support only 32 logical processors. It will work; however, the additional processor resources will not be utilized.
Windows users should check software licensing agreements to confirm that they recognize logical processors. Generally XP will support licensing on a per physical CPU basis, while Windows 2000 will see logical processors as physical processors for licensing purposes.
Figure 2 shows a Windows XP Pro task manager on a dual-processor HT system, note the four distinct "CPU Usage History" charts depicting the four logical processors.
The 2.4 kernel began supporting Hyper-Threading on the Intel Xeon processor as of version 2.4.18. The thread scheduler in 2.4, however, does not understand the difference between logical and physical processors, in addition to many other SMT scheduler optimizations, similar to the Windows 2000 family of products. This can lead to degraded performance in situations where two threads are scheduled concurrently on one physical processor, while the other physical processor is left idle.
As of kernel version 2.5.32, the thread scheduler was updated with advanced features to support Hyper-Threading. The 2.5.x kernel is the development branch that will become the 2.6 kernel. The exact release schedule for 2.6 is unknown, but in a recent interview Linus Torvalds indicated that 2.6 would likely be released in Q4 2003.
Figure 3 shows a Red Hat 7.3 installation running the 2.4.18 kernel with Hyper-Threading enabled on the system. Note the four CPU states indicated as CPU0-CPU3 on top. Also note that CPU0 is running at 100.1% utilization wow, Hyper-Threading is cool!
Threaded Benchmarking on HT and Non-HT Systems
Our goal here is to understand the effects of Hyper-Threading processors on the performance of multithreaded Java applications. To do this, we need a test bed that will allow us to execute heavily threaded operations and track performance variations against thread count in HT and non-HT systems.
Thread Bench Design
At a basic level, the test bed should be able to execute multiple operations across n threads, observing the total throughput of operations per unit of time for a run. On a dual-processor system, we should see nearly double the performance on a CPU-intensive operation using two threads instead of one. The performance of CPU-intensive threaded operations on HT systems will vary based on the operations and the level of concurrency possible on a single physical processor.
Our focus here is to explore which types of operations will and will not benefit from HT technology. Given this we need to be able to quickly implement and test multiple types of operations.
There are several Java benchmarking systems available on the market. Many are older and focused on applet performance. Some newer benchmark systems like VolanoMark or SPECjbb2000 test the threaded performance of systems; however, they don't allow us to customize and focus on specific individual operations that could affect performance on an HT system.
These requirements drove the design and coding of an n-thread Java benchmark framework. The framework supports pluggable operation classes and produces plottable results for a range of thread counts from a single test suite execution.
Figure 4 presents a functional/UML diagram for the system design.
The resulting benchmarking framework has the following features:
The code for this article can be downloaded from the JDJ Web site, www.sys-con.com/java/sourcec.cfm.
Factors Affecting Performance
Use of Threads
This seems obvious; however, it needs to be mentioned: single-threaded applications (often client applications) will see little performance gain. Server-side Java applications make extensive use of threads, making them excellent candidates for performance improvement from SMT.
Nonthreaded applications may still see some benefit. Java's garbage collection and background JIT compilers operate as daemon threads in the local JVM. In addition, concurrent processes could make use of the additional processor resources.
The Operating System's Thread Scheduler
In an HT system, a single physical processor is presented to the OS as two logical processors. This requires the OS to differentiate between physical and logical processors and make intelligent decisions about thread scheduling.
The thread scheduler on a dual-processor HT system will see four logical processors. A poor thread scheduler could schedule two CPU-intensive threads onto separate logical processors representing the same physical processor. This would result in a perceived performance decrease on an HT-based system.
CPU Resource Utilization
Hyper-Threaded processors do not duplicate all available resources. Two threads performing fundamentally similar operations on separate logical processors will likely see little performance gain. For HT to be a benefit, the two threads coexisting on a physical CPU must perform a variety of operations to allow the processor to make better use of latency.
Performance of Threaded Benchmarks on HT and Non-HT Systems
Tests were run on two HT-capable dual-processor systems (see Table 3).
Hyper-Threading requires BIOS support, making it easy to enable and disable the feature in the boot setup program for various runs.
Each test was run with the Sun JDK 1.4.1_02, using the server flag on the Linux and XP systems. Tests were also run with the IBM 1.4.0 JVM, with no command-line flags, on the Linux system.
The tests devised are by no means comprehensive. The goal was to stress the processor, using different processor resources, to try to gain some insight into the effects of SMT processing. The series of tests was run on each of the above systems, with and without HT enabled. Each of the operation algorithms tested is briefly described, followed by results and some discussion and interpretation.
Note: To save space, the XP and Linux tests are shown on the same plots. The data should not be directly compared, however. The tests were run on different physical hardware, indeed the processor speeds on the XP machine were higher than on the Linux machine.
Test 1: Gaussian Elimination, 500x500 matrix (Floating point intensive)
Gaussian elimination is a very common algorithm used to solve systems of linear equations a common task in finite element applications, weather simulation, coordinate transformations, and economic modeling among other things. Algorithmic optimizations are often done for sparse/banded matrices; however, the core of the work is fundamentally the same large numbers of floating point calculations are required.
To simulate this, a Gaussian elimination algorithm with scaled partial pivoting and back substitution is used (see Figure 5). A full matrix is constructed of random doubles using Math.random(). The population of the matrix is carried out in the setup() method and is not considered part of the operation.
This operation carries out large numbers of simple floating point operations on doubles. All calculations are done in the Java call stack, though it's highly likely that the code was optimized by the JIT before the tests were run.
It seems that this operation does not scale well into threads on any JVM. The Sun VM on Microsoft with Hyper-Threading does significantly worse than the Linux JVMs with or without Hyper-Threading. There are no synchronizations in the operation whatsoever. Poor scaling into threads could be due to memory barriers, or contention for a bus or main memory.
Test 2: Calculation of 2000! (Integer intensive)
Calculation of factorial (! operator) is used often in probability calculations. It's used as a portion of the formula for combinations and permutations. Factorial is defined as follows:
N! = 1 x 2 x 3 x 4 x S x N
Combinations are an interesting calculation in poker, and illustrate a potential use of the factorial operator. To calculate the number of five-card combinations in a 52-card deck, we use the combinations formula:
Possible poker hands= 52C5 =52C5=52!5! (52-5)!
Factorial calculations of even small integers grow rapidly, requiring the use of the java.math.BigInteger class. Calculations of factorials result in a large number of integer multiplications.
The factorial calculations shown in Figure 6 do show some consistent, limited benefit from Hyper-Threading. Indeed, for four threads the IBM JVM shows a 17% increase in performance using an HT-enabled system.
Incidentally, there are 2,598,960 five-card combinations in a 52-card deck.
Test 3: 150K calculations of Math.tan() (Floating point, mixed stack)
This test simply calculates the tangent of an angle 150,000 times in a tight loop (see Figure 7).
All Java threads have two call stacks: one for Java calls, the other for C calls. The java.lang.Math.tan(double) function is native, calculating an approximation of tangent with a 27th order polynomial. It's likely that the reason this operation scales so well into Hyper-Threading is the constant call stack switching, giving the processor time to utilize its secondary thread context.
Test 4: Prime number search
A prime number search operation was created using the BigInteger class and a very simplistic direct search factorization. The poor algorithm is not as important as the type of calculations being performed. This class performs a large number of BigInteger divisions.
It is difficult to tell what is going on in Figure 8, beyond the fact that the IBM JVM is beating Sun's. The IBM JVM scales well into threading this operation. It does even better when Hyper-Threading is enabled. The Sun VM scales poorly into threads, and it becomes worse with additional thread contexts. You could speculate that this behavior is characteristic of a low-level synchronization contention issue in the Sun JVM.
The plots above give some general idea of how these various operations scale into threads. In most cases, the HT performance gains are modest. The following is a summary of performance differences seen with Hyper-Threading enabled versus disabled for each of the tested JVMs.
IBM 1.4.0, Linux 2.4.18
Sun 1.4.1, Linux 2.4.18
Sun 1.4.1, Windows XP Pro
When I began this project, I fully expected to see marked performance gains using Hyper-Threading over identical hardware not using HT. In the course of testing, I've learned quite a bit about performance differences for Java on various platforms, hardware configurations, and virtual machines. Hyper-Threading is not the boon I had expected. In some situations, performance gains for HT reached the 75% mark, which is considerable. There was little significant performance degradation using HT, so using it seems to be largely on the upside.
Perhaps the more important finding is that the IBM JVMs perform significantly better than the Sun JVMs. In addition, the IBM JVMs scaled far better with threads than did Sun's offering. If performance is of key concern, and you're not using some of the more esoteric features of the Sun JVM, IBM JVMs deserve serious consideration.
Most server-side Java applications are not doing computationally intensive tasks. The tasks focus more heavily on socket IO communicating with databases, clients via HTTP, RMI, Web services, and the like. Processors will be given plenty of socket IO wait time to schedule parallel tasks. For socket-IO-bound applications, be sure to consider the relative skill of your operating system in the IP arena.
The introduction of Hyper-Threading on desktop P4 systems is also exciting. Java developers often develop on Windows or Linux-based desktop systems and deploy onto larger SMP and potentially SMT systems. HT will allow a desktop developer and user to see some of the benefits of threaded applications long before deployment to the higher-end systems.
SMT technology is here to stay. Intel's Hyper-Threading implementation is sure to be the first of many. Chip industry watchers speculate that Simultaneous Multithreading and thread-level parallelism will spell the ultimate end of the "megahertz wars." A chip's performance will be tied less to its internal clock speed and more to the bells and whistles it incorporates. Other chip manufacturers are sure to follow suit, and all implementations will improve in quality over time.
Operating systems are also continually improving their support for Hyper-Threading. It does seem strange that the performance on an XP system, which should be HT optimized, was often less HT friendly than the 2.4.18 Linux kernel, which is HT ignorant. As more sophisticated support for HT is built into operating systems, we should see more significant performance gains using HT in the Java world.
The combination of Java and Linux in the datacenter is rapidly gaining ground on the Solaris/Java platform. The majority of these new Linux servers are running high-end Intel-based hardware. Hyper-Threading will give this trend a further push in the Linux direction.
For now, given a piece of hardware that's HT capable, the configuration that offers the best performance under most conditions is the IBM 1.4.0 JVM on Linux with Hyper-Threading enabled.
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at Internet of @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, will discuss how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money! Speaker Bio: Esmeralda Swartz, CMO of MetraTech, has spent 16 years as a marketing, product management, and busin...
Oct. 24, 2014 09:30 PM EDT Reads: 1,081
Samsung VP Jacopo Lenzi, who headed the company's recent SmartThings acquisition under the auspices of Samsung's Open Innovaction Center (OIC), answered a few questions we had about the deal. This interview was in conjunction with our interview with SmartThings CEO Alex Hawkinson. IoT Journal: SmartThings was developed in an open, standards-agnostic platform, and will now be part of Samsung's Open Innovation Center. Can you elaborate on your commitment to keep the platform open? Jacopo Lenzi: Samsung recognizes that true, accelerated innovation cannot be driven from one source, but requires a...
Oct. 23, 2014 11:45 PM EDT Reads: 2,588
SYS-CON Events announced today that Red Hat, the world's leading provider of open source solutions, will exhibit at Internet of @ThingsExpo, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Red Hat is the world's leading provider of open source software solutions, using a community-powered approach to reliable and high-performing cloud, Linux, middleware, storage and virtualization technologies. Red Hat also offers award-winning support, training, and consulting services. As the connective hub in a global network of enterprises, partners, a...
Oct. 23, 2014 11:30 PM EDT Reads: 1,647
P2P RTC will impact the landscape of communications, shifting from traditional telephony style communications models to OTT (Over-The-Top) cloud assisted & PaaS (Platform as a Service) communication services. The P2P shift will impact many areas of our lives, from mobile communication, human interactive web services, RTC and telephony infrastructure, user federation, security and privacy implications, business costs, and scalability. In his session at Internet of @ThingsExpo, Robin Raymond, Chief Architect at Hookflash Inc., will walk through the shifting landscape of traditional telephone a...
Oct. 23, 2014 08:15 PM EDT Reads: 1,537
BSQUARE is a global leader of embedded software solutions. We enable smart connected systems at the device level and beyond that millions use every day and provide actionable data solutions for the growing Internet of Things (IoT) market. We empower our world-class customers with our products, services and solutions to achieve innovation and success. For more information, visit www.bsquare.com.
Oct. 23, 2014 08:00 PM EDT Reads: 1,588
Oct. 23, 2014 08:00 PM EDT Reads: 1,655
How do APIs and IoT relate? The answer is not as simple as merely adding an API on top of a dumb device, but rather about understanding the architectural patterns for implementing an IoT fabric. There are typically two or three trends: Exposing the device to a management framework Exposing that management framework to a business centric logic • Exposing that business layer and data to end users. This last trend is the IoT stack, which involves a new shift in the separation of what stuff happens, where data lives and where the interface lies. For instance, it’s a mix of architectural style...
Oct. 23, 2014 07:45 PM EDT Reads: 1,606
SYS-CON Events announced today that SOA Software, an API management leader, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. SOA Software is a leading provider of API Management and SOA Governance products that equip business to deliver APIs and SOA together to drive their company to meet its business strategy quickly and effectively. SOA Software’s technology helps businesses to accelerate their digital channels with APIs, drive partner adoption, monetize their assets, and achieve a...
Oct. 23, 2014 06:15 PM EDT Reads: 1,633
From a software development perspective IoT is about programming "things," about connecting them with each other or integrating them with existing applications. In his session at @ThingsExpo, Yakov Fain, co-founder of Farata Systems and SuranceBay, will show you how small IoT-enabled devices from multiple manufacturers can be integrated into the workflow of an enterprise application. This is a practical demo of building a framework and components in HTML/Java/Mobile technologies to serve as a platform that can integrate new devices as they become available on the market.
Oct. 23, 2014 06:15 PM EDT Reads: 1,588
SYS-CON Events announced today that Utimaco will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Utimaco is a leading manufacturer of hardware based security solutions that provide the root of trust to keep cryptographic keys safe, secure critical digital infrastructures and protect high value data assets. Only Utimaco delivers a general-purpose hardware security module (HSM) as a customizable platform to easily integrate into existing software solutions, embed business logic and build s...
Oct. 23, 2014 05:45 PM EDT Reads: 1,552
Connected devices are changing the way we go about our everyday life, from wearables to driverless cars, to smart grids and entire industries revolutionizing business opportunities through smart objects, capable of two-way communication. But what happens when objects are given an IP-address, and we rely on that connection, sometimes with our lives? How do we secure those vast data infrastructures and safe-keep the privacy of sensitive information? This session will outline how each and every connected device can uphold a core root of trust via a unique cryptographic signature – a “bir...
Oct. 23, 2014 05:00 PM EDT Reads: 1,453
Internet of @ThingsExpo Silicon Valley announced on Thursday its first 12 all-star speakers and sessions for its upcoming event, which will take place November 4-6, 2014, at the Santa Clara Convention Center in California. @ThingsExpo, the first and largest IoT event in the world, debuted at the Javits Center in New York City in June 10-12, 2014 with over 6,000 delegates attending the conference. Among the first 12 announced world class speakers, IBM will present two highly popular IoT sessions, which will take place November 4-6, 2014 at the Santa Clara Convention Center in Santa Clara, Calif...
Oct. 23, 2014 01:00 PM EDT Reads: 1,617
Almost everyone sees the potential of Internet of Things but how can businesses truly unlock that potential. The key will be in the ability to discover business insight in the midst of an ocean of Big Data generated from billions of embedded devices via Systems of Discover. Businesses will also need to ensure that they can sustain that insight by leveraging the cloud for global reach, scale and elasticity.
Oct. 22, 2014 09:00 PM EDT Reads: 1,466
WebRTC defines no default signaling protocol, causing fragmentation between WebRTC silos. SIP and XMPP provide possibilities, but come with considerable complexity and are not designed for use in a web environment. In his session at Internet of @ThingsExpo, Matthew Hodgson, technical co-founder of the Matrix.org, will discuss how Matrix is a new non-profit Open Source Project that defines both a new HTTP-based standard for VoIP & IM signaling and provides reference implementations.
Oct. 22, 2014 01:15 PM EDT Reads: 1,584
SUNNYVALE, Calif., Oct. 20, 2014 /PRNewswire/ -- Spansion Inc. (NYSE: CODE), a global leader in embedded systems, today added 96 new products to the Spansion® FM4 Family of flexible microcontrollers (MCUs). Based on the ARM® Cortex®-M4F core, the new MCUs boast a 200 MHz operating frequency and support a diverse set of on-chip peripherals for enhanced human machine interfaces (HMIs) and machine-to-machine (M2M) communications. The rich set of periphera...
Oct. 21, 2014 08:30 PM EDT Reads: 1,643
SYS-CON Events announced today that Aria Systems, the recurring revenue expert, has been named "Bronze Sponsor" of SYS-CON's 15th International Cloud Expo®, which will take place on November 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Aria Systems helps leading businesses connect their customers with the products and services they love. Industry leaders like Pitney Bowes, Experian, AAA NCNU, VMware, HootSuite and many others choose Aria to power their recurring revenue business and deliver exceptional experiences to their customers.
Oct. 21, 2014 06:00 PM EDT Reads: 1,548
The Internet of Things (IoT) is going to require a new way of thinking and of developing software for speed, security and innovation. This requires IT leaders to balance business as usual while anticipating for the next market and technology trends. Cloud provides the right IT asset portfolio to help today’s IT leaders manage the old and prepare for the new. Today the cloud conversation is evolving from private and public to hybrid. This session will provide use cases and insights to reinforce the value of the network in helping organizations to maximize their company’s cloud experience.
Oct. 21, 2014 05:15 PM EDT Reads: 1,619
The Internet of Things (IoT) is making everything it touches smarter – smart devices, smart cars and smart cities. And lucky us, we’re just beginning to reap the benefits as we work toward a networked society. However, this technology-driven innovation is impacting more than just individuals. The IoT has an environmental impact as well, which brings us to the theme of this month’s #IoTuesday Twitter chat. The ability to remove inefficiencies through connected objects is driving change throughout every sector, including waste management. BigBelly Solar, located just outside of Boston, is trans...
Oct. 21, 2014 09:00 AM EDT Reads: 1,799
Oct. 20, 2014 11:45 PM EDT Reads: 1,505
Predicted by Gartner to add $1.9 trillion to the global economy by 2020, the Internet of Everything (IoE) is based on the idea that devices, systems and services will connect in simple, transparent ways, enabling seamless interactions among devices across brands and sectors. As this vision unfolds, it is clear that no single company can accomplish the level of interoperability required to support the horizontal aspects of the IoE. The AllSeen Alliance, announced in December 2013, was formed with the goal to advance IoE adoption and innovation in the connected home, healthcare, education, aut...
Oct. 20, 2014 11:15 PM EDT Reads: 1,879