Welcome!

Java Authors: Torben Andersen, Sergio Varga, Liz McMillan, Elizabeth White, Patrick Carey

Related Topics: Java

Java: Article

Multi-Core and Massively Parallel Processors

Coming soon to a theater near you...

As software developers we have enjoyed a long trend of consistent performance improvement from processor technology. In fact, for the last 20 years processor performance has consistently doubled about every two years or so. What would happen in a world where these performance improvements suddenly slowed dramatically or even stopped? Could we continue to build bigger and heavier, feature-rich software? Would it be time to pack up our compilers and go home?

The truth is, single threaded performance improvement is likely to see a significant slowdown over the next one to three years. In some cases, single-thread performance may even drop. The long and sustained climb will slow dramatically. We call the cause behind this trend the CLIP level.

  • C - Clock frequency increases have hit a thermal wall
  • L - Latency of processor-to-memory requests continues as a key performance bottleneck
  • IP - Instruction-level Parallelism is already fully exploited by current processor and compiler technologies.
To overcome these challenges the industry is looking to multi-core and multithreaded processor designs to continue the performance improvement trend. These designs don't look to improve the performance of single threads of execution, but instead to run many and sometimes massive numbers of threads in parallel. Wait just a minute though. Is concurrent programming that easy? Hasn't it been tried before?

This article will dive deeper into the current issues challenging processor performance improvement and include a high-level overview of the key microprocessor players: Intel, AMD, Sun, and IBM. Finally, we'll take a deep dive into the challenges, opportunities, and technologies available to Java programmers to take advantage of concurrent programming to leverage these new processor technologies. If you're not programming in parallel today, you will be soon.

Multi-Core Mania
Increases in processor clock frequency are slowing and in many cases are being decreased to reduce power consumption. One trend continues though. The industry continues to shrink the size of transistors, doubling the number of transistors on a chip about every two years or so. In 2007 most major chip manufacturers will begin the shift from a 60nm to a 45nm process. This will yield transistors about 1/2000th the width of a human hair! To provide a relative perspective, a silicon atom itself is about 1/4nm. Obviously continuing to halve the size of transistors will also reach a limit in the not too distant future. But that's a topic for another paper.

So, how will the industry use this new transistor budget to improve processor performance? Techniques such as superscalar execution, pipelining, and speculative processing with branch prediction have added significant complexity to microprocessor designs, but have also been successful at improving performance. Unfortunately, the latency to memory on cache misses and the high frequency of branches in most workloads is proving to be a limiting factor. Building ever-larger caches is one way to mitigate the memory latency problem but as cache size exceeds common working set size, there are rapidly diminishing returns for investing transistors in cache memory.

Instead, the industry is moving toward multi-core, multithreading, and specialization. Instead of improving the performance of a single thread on a single core, the transistor budget is being used to add multiple cores to a single chip. Further, in many cases each core is capable of running multiple threads to hide memory latency. When one thread is blocked by a long latency event, such as a cache miss, the processor simply switches to another thread to execute. Also, many chip designs now include special-purpose processing units that make effective use of transistors for specific tasks such as cryptography.

Taking a closer look at the processors themselves, the IBM Power is distinguished as being the first to introduce multiple cores on a chip in the Power 4 design in 2001. IBM recently introduced the Power 6 processor, which combines two high-performance cores on a chip with each core supporting two-way multithreading. Besides providing multiple cores, the Power 6 also achieved an amazing 4.7GHz clock rate showing that IBM remains serious about single-thread performance while keeping pace with the industry on multi-core. As Power 6 is destined to be included in high-end servers, IBM has also focused heavily on RAS (reliability, availability, serviceability) and virtualization.

In the x86 architecture camp, rivals AMD and Intel have both recently introduced multi-core processors. In 2006, Intel introduced chips with two cores while chips with four cores, based on 45nm technology, shouldappear this year. As part of the move to multi-core, Intel removed support for its version of multithreading known as "hyperthreading," although multithreading is expected to return in future designs. Not to be outdone, AMD later this year, will introduce its first four-core chip known as Barcelona. Both Intel and AMD continue to focus on single-thread performance as well, each introducing new innovations in instruction-level parallelism and caches. One key difference in their designs is the memory bus architecture. Intel is continuing with its symmetric front-side bus architecture. AMD, on the other hand, has introduced a NUMA-based design based on the open Hypertransport technology in hopes of alleviating the memory bus bottleneck.

Sun has adopted a more radical departure in design from prior generations of SPARC. At the end of 2005, Sun released the UltraSPARC T1 or Niagara processor. Niagara includes up to eight cores, significantly more than competing server processors. Sun was able to squeeze eight cores on the chip by shifting focus away from the best achievable single-thread performance toward high chip-level throughput. Niagara cores run at a relatively low clock rate and don't support out-of-order processing, branch prediction, or many other common ILP optimizations. Instead they depend on four-way multithreading to tolerate long waits for memory. The goal is to achieve high overall throughput through application concurrency. However, applications with lower concurrency may run significantly slower on Niagara relative to the other processors described here.

At this point, all of the key players are producing chips with multiple cores but diverging in core design, memory nest, and other important aspects. The key to success for processor designers over the next few years will be in the innovative use of their transistor budget. Architects will make strategic tradeoffs between single-thread performance, massive concurrency, cache sizes, power consumption, and specialized processing units. The companies that make tradeoffs in the most innovative ways to meet the demands of the market should emerge as the winners.

Parallel Programming
As a developer, it will be important for you to learn the skills necessary to develop applications that can run with high performance on these increasingly parallel processors. Since single-thread performance isn't likely to improve at historical rates, the developer will have to look to concurrency to improve performance for a given task. The goal of parallel programming is to reduce the time of a task by dividing it into a set of subtasks that can be processed concurrently. While this may seem simple enough, experience shows that developing correct and effective parallel programs is surprisingly difficult. To utilize parallelism in hardware effectively, software tasks must be decomposed into subtasks, code must be written to coordinate the subtasks and work must be balanced as much as possible. Still sound easy? Read on.

As you get started with parallel programming, the first rule to become familiar with is Amdahl's Law. Amdahl's Law says that speeding up your program is limited by the part that's not running in parallel. For example, if a profile reveals that 20% of the time is spent in code that can only run sequentially on one processor, then the best speed increase you can possibly get, even with perfect parallelization of the rest of your program is 5x, no matter how many processors you throw at it. Load imbalance is a similar problem. If you've divided your code into N subtasks, the time taken to execute them is not 1/N. Rather the time taken is the maximum of the execution times of the subtasks.

If getting your code divided into subtasks and ensuring that work is well balanced sounded hard, then let us introduce you to the coordination problem. Unfortunately, very few programs can be parallelized so simply. The reason is that those subtasks are likely to want to operate on the same data and some of the subtasks may have to wait for others to do their thing before proceeding. It's okay if two subtasks want to read the same memory location in parallel, but if one of them wants to write to the location, you've got trouble because you can't predict which subtask will get to it first.

For example, operations to insert and remove objects from a linked list must be executed so that updates to the data structures happen sequentially and don't corrupt each other. An incorrect ordering of accesses to a memory location is called a data race and it can be one of the most difficult bugs to find because your code might behave differently on each run and might even change once you decide to start debugging. To deal with this problem, most programming environments include mechanisms to ensure that a subtask has exclusive access to specific memory, commonly called locks. Unfortunately locks bring their own unique problems when multiple subtasks compete for access and, if used indiscriminately, can reverse all of your hard work in parallelizing your code by making subtasks wait too often or too long for exclusive access to shared memory.


More Stories By J. Stan Cox

J. Stan Cox is a senior engineer with IBM's WebSphere Application Server performance group. In this role, he has worked to improve WebSphere application performance for J2EE, Web 2.0, Web services, XML and more. His current focus is WebSphere multicore and parallel foundation performance. Stan holds a B.S.C.S from Appalachian State University (1990) and an MS in computer science from Clemson University (1992).

More Stories By Bob Blainey

Bob Blainey is a Distinguished Engineer in the IBM Software Group, responsible for the technical roadmap for software in the era of multi-core and related next-generation systems innovations. Bob is an expert in programming languages and compilers having spent much of his career at IBM driving ever-greater performance and parallelism through program analyses and transformations. Immediately prior to his current position, Bob was CTO for Java at IBM. He is a member of the IBM Academy of Technology, an IBM Master Inventor, and, most impressive of all, manages to remain sane with two pre-teen daughters in the house.

More Stories By Vijay Saraswat

Vijay Saraswat joined IBM Research in 2003 after a year as a professor at Penn State, a couple of years at start-ups, and 13 years at Xerox PARC and AT&T Research. His main interests are in programming languages, constraints, logic, and concurrency. At IBM, he leads the work on the design and implementation of X10, a modern object-oriented programming language intended for scalable concurrent computing. Over the last 20 years he has lectured at most major universities and research labs in U.S.A. and Europe. Vijay got a B Tech degree from the Indian Institute of Technology, Kanpur, and an MS and PhD from Carnegie-Mellon University. His thesis on concurrent constraint programming won the ACM Doctoral Dissertation Award in 1989, and a related paper won a best-paper-in-20-years award in its area.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Jim Falgout 11/26/07 09:37:55 AM EST

The X10 site does not seem very active. The news feed on the last release is dated December of 2006. Do you have any information on the current state of X10?

@ThingsExpo Stories
“In the past year we've seen a lot of stabilization of WebRTC. You can now use it in production with a far greater degree of certainty. A lot of the real developments in the past year have been in things like the data channel, which will enable a whole new type of application," explained Peter Dunkley, Technical Director at Acision, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
The BPM world is going through some evolution or changes where traditional business process management solutions really have nowhere to go in terms of development of the road map. In this demo at 15th Cloud Expo, Kyle Hansen, Director of Professional Services at AgilePoint, shows AgilePoint’s unique approach to dealing with this market circumstance by developing a rapid application composition or development framework.
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo (http://www.CloudComputingExpo.com), moderated by Ashar Baig, Research Director, Cloud, at Gigaom Research, Nate Gordon, Director of T...
"BSQUARE is in the business of selling software solutions for smart connected devices. It's obvious that IoT has moved from being a technology to being a fundamental part of business, and in the last 18 months people have said let's figure out how to do it and let's put some focus on it, " explained Dave Wagstaff, VP & Chief Architect, at BSQUARE Corporation, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Windstream, a leading provider of advanced network and cloud communications, has been named “Silver Sponsor” of SYS-CON's 16th International Cloud Expo®, which will take place on June 9–11, 2015, at the Javits Center in New York, NY. Windstream (Nasdaq: WIN), a FORTUNE 500 and S&P 500 company, is a leading provider of advanced network communications, including cloud computing and managed services, to businesses nationwide. The company also offers broadband, phone and digital TV services to consumers primarily in rural areas.
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, discussed the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data a...

ARMONK, N.Y., Nov. 20, 2014 /PRNewswire/ --  IBM (NYSE: IBM) today announced that it is bringing a greater level of control, security and flexibility to cloud-based application development and delivery with a single-tenant version of Bluemix, IBM's platform-as-a-service. The new platform enables developers to build ap...

SYS-CON Events announced today that IDenticard will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. IDenticard™ is the security division of Brady Corp (NYSE: BRC), a $1.5 billion manufacturer of identification products. We have small-company values with the strength and stability of a major corporation. IDenticard offers local sales, support and service to our customers across the United States and Canada. Our partner network encompasses some 300 of the world's leading systems integrators and security s...
DevOps Summit 2015 New York, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that it is now accepting Keynote Proposals. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Nigeria has the largest economy in Africa, at more than US$500 billion, and ranks 23rd in the world. A recent re-evaluation of Nigeria's true economic size doubled the previous estimate, and brought it well ahead of South Africa, which is a member (unlike Nigeria) of the G20 club for political as well as economic reasons. Nigeria's economy can be said to be quite diverse from one point of view, but heavily dependent on oil and gas at the same time. Oil and natural gas account for about 15% of Nigera's overall economy, but traditionally represent more than 90% of the country's exports and as...
The Internet of Things is a misnomer. That implies that everything is on the Internet, and that simply should not be - especially for things that are blurring the line between medical devices that stimulate like a pacemaker and quantified self-sensors like a pedometer or pulse tracker. The mesh of things that we manage must be segmented into zones of trust for sensing data, transmitting data, receiving command and control administrative changes, and peer-to-peer mesh messaging. In his session at @ThingsExpo, Ryan Bagnulo, Solution Architect / Software Engineer at SOA Software, focused on desi...
"At our booth we are showing how to provide trust in the Internet of Things. Trust is where everything starts to become secure and trustworthy. Now with the scaling of the Internet of Things it becomes an interesting question – I've heard numbers from 200 billion devices next year up to a trillion in the next 10 to 15 years," explained Johannes Lintzen, Vice President of Sales at Utimaco, in this SYS-CON.tv interview at @ThingsExpo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
"For over 25 years we have been working with a lot of enterprise customers and we have seen how companies create applications. And now that we have moved to cloud computing, mobile, social and the Internet of Things, we see that the market needs a new way of creating applications," stated Jesse Shiah, CEO, President and Co-Founder of AgilePoint Inc., in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that Gridstore™, the leader in hyper-converged infrastructure purpose-built to optimize Microsoft workloads, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. Gridstore™ is the leader in hyper-converged infrastructure purpose-built for Microsoft workloads and designed to accelerate applications in virtualized environments. Gridstore’s hyper-converged infrastructure is the industry’s first all flash version of HyperConverged Appliances that include both compute and storag...
Today’s enterprise is being driven by disruptive competitive and human capital requirements to provide enterprise application access through not only desktops, but also mobile devices. To retrofit existing programs across all these devices using traditional programming methods is very costly and time consuming – often prohibitively so. In his session at @ThingsExpo, Jesse Shiah, CEO, President, and Co-Founder of AgilePoint Inc., discussed how you can create applications that run on all mobile devices as well as laptops and desktops using a visual drag-and-drop application – and eForms-buildi...
We certainly live in interesting technological times. And no more interesting than the current competing IoT standards for connectivity. Various standards bodies, approaches, and ecosystems are vying for mindshare and positioning for a competitive edge. It is clear that when the dust settles, we will have new protocols, evolved protocols, that will change the way we interact with devices and infrastructure. We will also have evolved web protocols, like HTTP/2, that will be changing the very core of our infrastructures. At the same time, we have old approaches made new again like micro-services...
Code Halos - aka "digital fingerprints" - are the key organizing principle to understand a) how dumb things become smart and b) how to monetize this dynamic. In his session at @ThingsExpo, Robert Brown, AVP, Center for the Future of Work at Cognizant Technology Solutions, outlined research, analysis and recommendations from his recently published book on this phenomena on the way leading edge organizations like GE and Disney are unlocking the Internet of Things opportunity and what steps your organization should be taking to position itself for the next platform of digital competition.
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is now open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at @ThingsExpo, Andrew Bolwell, Director of Innovation for HP's Printing and Personal Systems Group, discussed how key attributes of mobile technology – touch input, sensors, social, and ...