Welcome!

Java IoT Authors: Elizabeth White, Pat Romanski, Liz McMillan, Tim Hinds, Jonathan Fries

Related Topics: Java IoT

Java IoT: Article

Multi-Core and Massively Parallel Processors

Coming soon to a theater near you...

As software developers we have enjoyed a long trend of consistent performance improvement from processor technology. In fact, for the last 20 years processor performance has consistently doubled about every two years or so. What would happen in a world where these performance improvements suddenly slowed dramatically or even stopped? Could we continue to build bigger and heavier, feature-rich software? Would it be time to pack up our compilers and go home?

The truth is, single threaded performance improvement is likely to see a significant slowdown over the next one to three years. In some cases, single-thread performance may even drop. The long and sustained climb will slow dramatically. We call the cause behind this trend the CLIP level.

  • C - Clock frequency increases have hit a thermal wall
  • L - Latency of processor-to-memory requests continues as a key performance bottleneck
  • IP - Instruction-level Parallelism is already fully exploited by current processor and compiler technologies.
To overcome these challenges the industry is looking to multi-core and multithreaded processor designs to continue the performance improvement trend. These designs don't look to improve the performance of single threads of execution, but instead to run many and sometimes massive numbers of threads in parallel. Wait just a minute though. Is concurrent programming that easy? Hasn't it been tried before?

This article will dive deeper into the current issues challenging processor performance improvement and include a high-level overview of the key microprocessor players: Intel, AMD, Sun, and IBM. Finally, we'll take a deep dive into the challenges, opportunities, and technologies available to Java programmers to take advantage of concurrent programming to leverage these new processor technologies. If you're not programming in parallel today, you will be soon.

Multi-Core Mania
Increases in processor clock frequency are slowing and in many cases are being decreased to reduce power consumption. One trend continues though. The industry continues to shrink the size of transistors, doubling the number of transistors on a chip about every two years or so. In 2007 most major chip manufacturers will begin the shift from a 60nm to a 45nm process. This will yield transistors about 1/2000th the width of a human hair! To provide a relative perspective, a silicon atom itself is about 1/4nm. Obviously continuing to halve the size of transistors will also reach a limit in the not too distant future. But that's a topic for another paper.

So, how will the industry use this new transistor budget to improve processor performance? Techniques such as superscalar execution, pipelining, and speculative processing with branch prediction have added significant complexity to microprocessor designs, but have also been successful at improving performance. Unfortunately, the latency to memory on cache misses and the high frequency of branches in most workloads is proving to be a limiting factor. Building ever-larger caches is one way to mitigate the memory latency problem but as cache size exceeds common working set size, there are rapidly diminishing returns for investing transistors in cache memory.

Instead, the industry is moving toward multi-core, multithreading, and specialization. Instead of improving the performance of a single thread on a single core, the transistor budget is being used to add multiple cores to a single chip. Further, in many cases each core is capable of running multiple threads to hide memory latency. When one thread is blocked by a long latency event, such as a cache miss, the processor simply switches to another thread to execute. Also, many chip designs now include special-purpose processing units that make effective use of transistors for specific tasks such as cryptography.

Taking a closer look at the processors themselves, the IBM Power is distinguished as being the first to introduce multiple cores on a chip in the Power 4 design in 2001. IBM recently introduced the Power 6 processor, which combines two high-performance cores on a chip with each core supporting two-way multithreading. Besides providing multiple cores, the Power 6 also achieved an amazing 4.7GHz clock rate showing that IBM remains serious about single-thread performance while keeping pace with the industry on multi-core. As Power 6 is destined to be included in high-end servers, IBM has also focused heavily on RAS (reliability, availability, serviceability) and virtualization.

In the x86 architecture camp, rivals AMD and Intel have both recently introduced multi-core processors. In 2006, Intel introduced chips with two cores while chips with four cores, based on 45nm technology, shouldappear this year. As part of the move to multi-core, Intel removed support for its version of multithreading known as "hyperthreading," although multithreading is expected to return in future designs. Not to be outdone, AMD later this year, will introduce its first four-core chip known as Barcelona. Both Intel and AMD continue to focus on single-thread performance as well, each introducing new innovations in instruction-level parallelism and caches. One key difference in their designs is the memory bus architecture. Intel is continuing with its symmetric front-side bus architecture. AMD, on the other hand, has introduced a NUMA-based design based on the open Hypertransport technology in hopes of alleviating the memory bus bottleneck.

Sun has adopted a more radical departure in design from prior generations of SPARC. At the end of 2005, Sun released the UltraSPARC T1 or Niagara processor. Niagara includes up to eight cores, significantly more than competing server processors. Sun was able to squeeze eight cores on the chip by shifting focus away from the best achievable single-thread performance toward high chip-level throughput. Niagara cores run at a relatively low clock rate and don't support out-of-order processing, branch prediction, or many other common ILP optimizations. Instead they depend on four-way multithreading to tolerate long waits for memory. The goal is to achieve high overall throughput through application concurrency. However, applications with lower concurrency may run significantly slower on Niagara relative to the other processors described here.

At this point, all of the key players are producing chips with multiple cores but diverging in core design, memory nest, and other important aspects. The key to success for processor designers over the next few years will be in the innovative use of their transistor budget. Architects will make strategic tradeoffs between single-thread performance, massive concurrency, cache sizes, power consumption, and specialized processing units. The companies that make tradeoffs in the most innovative ways to meet the demands of the market should emerge as the winners.

Parallel Programming
As a developer, it will be important for you to learn the skills necessary to develop applications that can run with high performance on these increasingly parallel processors. Since single-thread performance isn't likely to improve at historical rates, the developer will have to look to concurrency to improve performance for a given task. The goal of parallel programming is to reduce the time of a task by dividing it into a set of subtasks that can be processed concurrently. While this may seem simple enough, experience shows that developing correct and effective parallel programs is surprisingly difficult. To utilize parallelism in hardware effectively, software tasks must be decomposed into subtasks, code must be written to coordinate the subtasks and work must be balanced as much as possible. Still sound easy? Read on.

As you get started with parallel programming, the first rule to become familiar with is Amdahl's Law. Amdahl's Law says that speeding up your program is limited by the part that's not running in parallel. For example, if a profile reveals that 20% of the time is spent in code that can only run sequentially on one processor, then the best speed increase you can possibly get, even with perfect parallelization of the rest of your program is 5x, no matter how many processors you throw at it. Load imbalance is a similar problem. If you've divided your code into N subtasks, the time taken to execute them is not 1/N. Rather the time taken is the maximum of the execution times of the subtasks.

If getting your code divided into subtasks and ensuring that work is well balanced sounded hard, then let us introduce you to the coordination problem. Unfortunately, very few programs can be parallelized so simply. The reason is that those subtasks are likely to want to operate on the same data and some of the subtasks may have to wait for others to do their thing before proceeding. It's okay if two subtasks want to read the same memory location in parallel, but if one of them wants to write to the location, you've got trouble because you can't predict which subtask will get to it first.

For example, operations to insert and remove objects from a linked list must be executed so that updates to the data structures happen sequentially and don't corrupt each other. An incorrect ordering of accesses to a memory location is called a data race and it can be one of the most difficult bugs to find because your code might behave differently on each run and might even change once you decide to start debugging. To deal with this problem, most programming environments include mechanisms to ensure that a subtask has exclusive access to specific memory, commonly called locks. Unfortunately locks bring their own unique problems when multiple subtasks compete for access and, if used indiscriminately, can reverse all of your hard work in parallelizing your code by making subtasks wait too often or too long for exclusive access to shared memory.


More Stories By J. Stan Cox

J. Stan Cox is a senior engineer with IBM's WebSphere Application Server performance group. In this role, he has worked to improve WebSphere application performance for J2EE, Web 2.0, Web services, XML and more. His current focus is WebSphere multicore and parallel foundation performance. Stan holds a B.S.C.S from Appalachian State University (1990) and an MS in computer science from Clemson University (1992).

More Stories By Bob Blainey

Bob Blainey is a Distinguished Engineer in the IBM Software Group, responsible for the technical roadmap for software in the era of multi-core and related next-generation systems innovations. Bob is an expert in programming languages and compilers having spent much of his career at IBM driving ever-greater performance and parallelism through program analyses and transformations. Immediately prior to his current position, Bob was CTO for Java at IBM. He is a member of the IBM Academy of Technology, an IBM Master Inventor, and, most impressive of all, manages to remain sane with two pre-teen daughters in the house.

More Stories By Vijay Saraswat

Vijay Saraswat joined IBM Research in 2003 after a year as a professor at Penn State, a couple of years at start-ups, and 13 years at Xerox PARC and AT&T Research. His main interests are in programming languages, constraints, logic, and concurrency. At IBM, he leads the work on the design and implementation of X10, a modern object-oriented programming language intended for scalable concurrent computing. Over the last 20 years he has lectured at most major universities and research labs in U.S.A. and Europe. Vijay got a B Tech degree from the Indian Institute of Technology, Kanpur, and an MS and PhD from Carnegie-Mellon University. His thesis on concurrent constraint programming won the ACM Doctoral Dissertation Award in 1989, and a related paper won a best-paper-in-20-years award in its area.

Comments (1) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
Jim Falgout 11/26/07 09:37:55 AM EST

The X10 site does not seem very active. The news feed on the last release is dated December of 2006. Do you have any information on the current state of X10?

@ThingsExpo Stories
SYS-CON Events announced today that Ericsson has been named “Gold Sponsor” of SYS-CON's @ThingsExpo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. Ericsson is a world leader in the rapidly changing environment of communications technology – providing equipment, software and services to enable transformation through mobility. Some 40 percent of global mobile traffic runs through networks we have supplied. More than 1 billion subscribers around the world re...
There is an ever-growing explosion of new devices that are connected to the Internet using “cloud” solutions. This rapid growth is creating a massive new demand for efficient access to data. And it’s not just about connecting to that data anymore. This new demand is bringing new issues and challenges and it is important for companies to scale for the coming growth. And with that scaling comes the need for greater security, gathering and data analysis, storage, connectivity and, of course, the...
We’ve worked with dozens of early adopters across numerous industries and will debunk common misperceptions, which starts with understanding that many of the connected products we’ll use over the next 5 years are already products, they’re just not yet connected. With an IoT product, time-in-market provides much more essential feedback than ever before. Innovation comes from what you do with the data that the connected product provides in order to enhance the customer experience and optimize busi...
SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus inter...
Increasing IoT connectivity is forcing enterprises to find elegant solutions to organize and visualize all incoming data from these connected devices with re-configurable dashboard widgets to effectively allow rapid decision-making for everything from immediate actions in tactical situations to strategic analysis and reporting. In his session at 18th Cloud Expo, Shikhir Singh, Senior Developer Relations Manager at Sencha, will discuss how to create HTML5 dashboards that interact with IoT devic...
So, you bought into the current machine learning craze and went on to collect millions/billions of records from this promising new data source. Now, what do you do with them? Too often, the abundance of data quickly turns into an abundance of problems. How do you extract that "magic essence" from your data without falling into the common pitfalls? In her session at @ThingsExpo, Natalia Ponomareva, Software Engineer at Google, will provide tips on how to be successful in large scale machine lear...
The increasing popularity of the Internet of Things necessitates that our physical and cognitive relationship with wearable technology will change rapidly in the near future. This advent means logging has become a thing of the past. Before, it was on us to track our own data, but now that data is automatically available. What does this mean for mHealth and the "connected" body? In her session at @ThingsExpo, Lisa Calkins, CEO and co-founder of Amadeus Consulting, will discuss the impact of wea...
A critical component of any IoT project is the back-end systems that capture data from remote IoT devices and structure it in a way to answer useful questions. Traditional data warehouse and analytical systems are mature technologies that can be used to handle large data sets, but they are not well suited to many IoT-scale products and the need for real-time insights. At Fuze, we have developed a backend platform as part of our mobility-oriented cloud service that uses Big Data-based approache...
trust and privacy in their ecosystem. Assurance and protection of device identity, secure data encryption and authentication are the key security challenges organizations are trying to address when integrating IoT devices. This holds true for IoT applications in a wide range of industries, for example, healthcare, consumer devices, and manufacturing. In his session at @ThingsExpo, Lancen LaChance, vice president of product management, IoT solutions at GlobalSign, will teach IoT developers how t...
We're entering the post-smartphone era, where wearable gadgets from watches and fitness bands to glasses and health aids will power the next technological revolution. With mass adoption of wearable devices comes a new data ecosystem that must be protected. Wearables open new pathways that facilitate the tracking, sharing and storing of consumers’ personal health, location and daily activity data. Consumers have some idea of the data these devices capture, but most don’t realize how revealing and...
Digital payments using wearable devices such as smart watches, fitness trackers, and payment wristbands are an increasing area of focus for industry participants, and consumer acceptance from early trials and deployments has encouraged some of the biggest names in technology and banking to continue their push to drive growth in this nascent market. Wearable payment systems may utilize near field communication (NFC), radio frequency identification (RFID), or quick response (QR) codes and barcodes...
The IETF draft standard for M2M certificates is a security solution specifically designed for the demanding needs of IoT/M2M applications. In his session at @ThingsExpo, Brian Romansky, VP of Strategic Technology at TrustPoint Innovation, will explain how M2M certificates can efficiently enable confidentiality, integrity, and authenticity on highly constrained devices.
Artificial Intelligence has the potential to massively disrupt IoT. In his session at 18th Cloud Expo, AJ Abdallat, CEO of Beyond AI, will discuss what the five main drivers are in Artificial Intelligence that could shape the future of the Internet of Things. AJ Abdallat is CEO of Beyond AI. He has over 20 years of management experience in the fields of artificial intelligence, sensors, instruments, devices and software for telecommunications, life sciences, environmental monitoring, process...
You think you know what’s in your data. But do you? Most organizations are now aware of the business intelligence represented by their data. Data science stands to take this to a level you never thought of – literally. The techniques of data science, when used with the capabilities of Big Data technologies, can make connections you had not yet imagined, helping you discover new insights and ask new questions of your data. In his session at @ThingsExpo, Sarbjit Sarkaria, data science team lead ...
The IoTs will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm and share the must-have mindsets for removing complexity from the development proc...
Manufacturers are embracing the Industrial Internet the same way consumers are leveraging Fitbits – to improve overall health and wellness. Both can provide consistent measurement, visibility, and suggest performance improvements customized to help reach goals. Fitbit users can view real-time data and make adjustments to increase their activity. In his session at @ThingsExpo, Mark Bernardo Professional Services Leader, Americas, at GE Digital, will discuss how leveraging the Industrial Interne...
In his session at @ThingsExpo, Chris Klein, CEO and Co-founder of Rachio, will discuss next generation communities that are using IoT to create more sustainable, intelligent communities. One example is Sterling Ranch, a 10,000 home development that – with the help of Siemens – will integrate IoT technology into the community to provide residents with energy and water savings as well as intelligent security. Everything from stop lights to sprinkler systems to building infrastructures will run ef...
Whether your IoT service is connecting cars, homes, appliances, wearable, cameras or other devices, one question hangs in the balance – how do you actually make money from this service? The ability to turn your IoT service into profit requires the ability to create a monetization strategy that is flexible, scalable and working for you in real-time. It must be a transparent, smoothly implemented strategy that all stakeholders – from customers to the board – will be able to understand and comprehe...
You deployed your app with the Bluemix PaaS and it's gaining some serious traction, so it's time to make some tweaks. Did you design your application in a way that it can scale in the cloud? Were you even thinking about the cloud when you built the app? If not, chances are your app is going to break. Check out this webcast to learn various techniques for designing applications that will scale successfully in Bluemix, for the confidence you need to take your apps to the next level and beyond.
SYS-CON Events announced today that Fusion, a leading provider of cloud services, will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. Fusion, a leading provider of integrated cloud solutions to small, medium and large businesses, is the industry's single source for the cloud. Fusion's advanced, proprietary cloud service platform enables the integration of leading edge solutions in the cloud, including cloud...