| By Glenn Coates, Carl Barratt | Article Rating: |
|
| January 1, 2002 12:00 AM EST | Reads: |
10,579 |
It could be argued that the clock speed of a given processing platform enables you to estimate the execution time of a user application running on that platform.
However, quoting figures such as MIPS (millions of instructions per second) are somewhat futile, since the execution of a specific number of instructions on one processor will not necessarily accomplish the same end result as that same number of instructions running on a different processor. It's the execution speed of a given set of instructions that's of greater concern when selecting an appropriate platform to run application code.
Clearly some platforms will be more proficient than others in this regard, though this is a difficult parameter to quantify since it's dependent to a large extent upon the application code in question. Benchmarking is the technique used to measure the speed at which a particular platform is able to execute code. Indeed, this is evident in the abundance of benchmarks available. Numerous examples of Java benchmarking are listed at http://www.epcc.ed.ac.uk/javagrande/links.html.
Benchmarks vary significantly in their complexity, but invariably they comprise a number of lines of code that, when executed on the platform being tested, generates a discrete value to use during its appraisal. This facilitates a comparison of the execution speed with similar platforms. Typically there are three types of benchmarks, which have inherited titles in accordance with their origin:
- User
- Manufacturer
- Industry
Market incentives have driven the introduction of manufacturer benchmarks; invariably these are written to benefit the platform in question and so can be disregarded unless used to facilitate the relative performance of platforms offered by that particular vendor.
Finally, the financial significance of benchmarking has resulted in the development of industry benchmarks, which are usually considered to be of high integrity. Such benchmarks are defined by an independent organization, typically composed of a panel of industry specialists.
Why Write a Paper on Java Benchmarking?
Results are published for multiple benchmarks and the primary
issues can be clouded by hype; as a consequence the selections
available to the end user are somewhat overwhelming. The crucial
point is how well your code performs on the chosen system, so the
question is: How do you identify a benchmark that best models your
application? An understanding of benchmarks is vital to enable the
user to select an accurate measurement tool for the platform in
question and not be misled by the results.
The purpose of this article is to educate device manufacturers, OEMs, and, more specifically, J2ME development engineers, while at the same time resolving any remaining anomalies in a discipline that's commonly misunderstood.
What Is a Benchmark?
Fundamentally, a benchmark should incorporate programs that,
when invoked methodically, exhaustively exercise the platform being
tested. Implicit in this process is the generation of a runtime
figure corresponding to the execution speed of the platform.
Benchmarks can be simplistic, comprising a sequence of simple routines executed successively to check the platform's response to standard functions (e.g., method invocation). Typically, both the overall elapsed time and that for each routine in isolation is considered; in the former case it's usual to assert a weighting coefficient to each routine that's indicative of its relevance in the more expansive context. Each routine should run for a reasonable amount of time. The issue here is an assurance that performance statistics are not lost within overheads at start-up. Benchmarks can also be more substantive; for example, processor-intensive applications can check multithreading by running several other routines simultaneously to evaluate context switching. Essentially there's no substitute for running the user's own application code on the platform in question. However, while this argument is laudable, it's beyond reasonable expectation that the platform manufacturer can implement this. To facilitate an accurate appraisal, it's vital that any standard benchmark utilized by competing manufacturers should mimic as much as possible the way the platform will ultimately be used.
The Advantages and Limitations of Benchmarking
Industry benchmarks are useful for providing a general
insight into the performance of a machine. Still, it's important not
to rely on these benchmarks since such a preoccupation distracts from
the bigger picture. While they can be employed generally to realize
the efficient comparison of different platforms, they have
shortcomings when applied specifically. For example, one function may
be heavily used in the application code when compared to another, or
certain functions may run concurrently on a regular basis. There are
inherent benefits in developing your own benchmark as this
facilitates the tailoring of routines to imitate the end application
or to expose specific inadequacies in peripheral support.
Manufacturers' benchmarks can be written to aid the cause of specific
vendors and so can easily be tailored to mislead.
When considering more restrictive embedded environments, such as those used by J2ME-compliant devices, it becomes apparent that the application developer must consider the risks inherent in the hardware implementation of a virtual machine prior to making a purchasing decision.
Speed is a primary consideration when adopting a JVM within restricted environments; implementations of the J2ME vary significantly in this respect, from JVMs that employ software interpretation and JIT compilers that compile the bytecode to target machine code while the application is being executed, to native Java processors offering much greater performance.
Other factors to consider include the response time of the user interface, implementation of the garbage collector, and memory issues since consumer devices don't have access to the abundant resources available to desktop machines. While this may seem a tangential point as far as benchmarking is concerned, it's one worth making since it's imperative that these areas in particular are comprehensively exercised. Subject to these caveats, benchmarking is a valuable technique that aids in the evaluation of processing platforms, and, more specifically, J2ME platforms.
Java-Specific Benchmarks
As with other platforms, numerous Java benchmarks have
appeared (see Fig 2).
CaffeineMark is a pertinent instance of a benchmark since its results are among those most frequently cited by the Java community. On this basis we chose it as an example for further discussion.
CaffeineMark encompasses a series of nine tests of similar length designed to measure disparate aspects of a Java Virtual Machine's performance. The product of these scores is then used to generate an overall CaffeineMark. The tests are:
- Loop: Employs a sort routine and sequence generation to quantify the compiler optimization of loops
- Sieve: Utilizes the classic sieve of Eratosthenes to extract prime numbers from a sequence
- Logic: Establishes the speed at which decision-making instructions are executed
- Method: Executes recursive function calls
- Float: Simulates a 3D rotation of objects around a point
- String: Executes various string-based operations
- Graphics: Draws random rectangles and lines
- Image: Draws a sequence of three graphics repeatedly
- Dialog: Writes a set of values into labels and boxes on a form
Bearing this in mind, alongside the high take-up of CaffeineMark in the industry, it's unfortunate that it's unsuitable for embedded environments such as J2ME. The cogency of this argument is based upon its inability to benchmark the interaction of Java subsystems, and the subsequent failure to imitate typical real-world applications faced by such devices. More specifically, it doesn't take into account certain situations in which a platform may have to cope with a heavily used heap, the garbage collector running all the time, multiple threading, or intensive user interface activities.
To address some of these issues, representatives of leading companies in the field have recently formed a committee under the banner of the Embedded Microprocessor Benchmark Consortium (EEMBC) to discuss the introduction of an industry benchmark for J2ME devices.
What Is EEMBC?
EEMBC (www.eembc.org) is an independent industry
benchmarking consortium that develops and certifies real-world
benchmarks for embedded microprocessors; the consortium is
established among manufacturers as a yardstick for benchmarking in
this context. A principal concern of the committee is to produce
dependable metrics, enabling system designers to evaluate the
performance of competing devices and consequently select the most
appropriate embedded processor for their needs. The industry-wide
nature of such committees intrinsically helps to combat the practice
among some vendors of striving to artificially improve their ratings
via special optimizations of the compiler, which is now so wretchedly
prevalent.
A subcommittee was recently formed under the umbrella of this organization to develop similar benchmarks for hardware-based virtual machines. Founding companies within the consortium include Vulcan Machines Ltd, ARM, Infineon, and TriMedia. Primarily the committee aims to identify the limitations of existing Java benchmarks, and to develop new ones in which "real-world" applications are afforded a higher priority than low-level functions.
An example benchmark conceived on this basis could be a Web browser. Since this is a very intensive end application in almost every respect, a figure relating to the proficiency of the device running low-level code in isolation wouldn't prove particularly representative of its functionality.
Consequently, the EEMBC consortium solution is expected to employ a series of applications reflecting typical real-world scenarios in which CDC- and CLDC-compliant devices can be employed. Further examples of such benchmarks include a generic game or organizer that exercises intensive garbage collection, scheduling, high memory usage, user interface, and dynamic class loading. This way system designers are able to evaluate potential devices for inclusion in their end application by the appraisal of a benchmark derived in an environment that's analogous to that application.
Other Considerations?
When applied prudently, benchmarks are an invaluable asset
that aid in the selection of hardware to suit a particular
application. However, they shouldn't be regarded as the sole
criteria. It's imperative that J2ME-embedded system designers don't
rely upon the use of benchmarks exclusively, since the issue is
clouded by many other factors.
In the context of J2ME, systems extend beyond the virtual machine to its interaction with peripheral devices such as a memory interface; clearly such peripherals and the interfaces to them must be considered when measuring the time it takes to execute an application. In the case of memory, limitations will be imposed on a J2ME-optimized device; this raises numerous issues that may impact the performance of the device, for example, garbage collection.
Also, implicitly, batteries are employed to power hardware that's compliant with the CLDC specification. Consequently, power consumption of the virtual machine is of primary concern and, accordingly, the clock speed must be kept to a minimum. For example, it's pertinent here that while software accelerators may post acceptable benchmark scores, they may also, as a consequence of their reliance upon a host processor, consume excessive power compared to a processor that executes Java as its native language.
Another significant factor is the device upon which the virtual machine is implemented. The FPGA or ASIC process used will clearly affect the speed at which the processor runs, and variations in benchmark scores are a natural corollary of this. Furthermore, the silicon cost of the entire solution that's required to execute Java bytecode must be considered, particularly where embedded System-on-Chip implementations of the JVM are concerned. Similarly, the designer should be aware of fundamental issues such as the "quality" of the JVM in terms of compliance with the J2ME specification, reliability, licensing costs, and the reputation of the hardware vendor for technical support. All these factors must be considered in tandem with the benchmark score of the virtual machine prior to making a purchasing decision.
Conclusion
No benchmark can replace the actual user application. At the
earliest possible stage in the design process, application developers
must run their own code on the proposed hardware, since similar
applications may post a significant disparity in terms of performance
on the same implementation of the virtual machine. However, since
designers are often focused on using their time more productively,
they frequently rely upon industry benchmarks for such data. While
there's no panacea, industry benchmarks such as that proposed by
EEMBC are a useful tool to aid in the evaluation of performance,
provided you're aware of its limitations in a J2ME environment.
Resources
- Coates, G. "Java Thick Clients with J2ME." Java Developer's Journal. Vol. 6, issue 6.
- Coates, G. "JVMs for Embedded Environments." Java Developer's Journal. Vol. 6, issue 9.
- Cataldo, A. (April, 2001). "Java Accelerator Vendors Mull Improved Benchmark." Electronic Engineering Times.
Published January 1, 2002 Reads 10,579
Copyright © 2002 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Glenn Coates
Glenn Coates has been a software engineer for nine years. For the last four years he has worked with mobile devices and Java developing products, such as smart phones, micro-browsers and digital set-top boxes. Glenn holds a degree in computer science and is also a Sun-certified architect for Java Technologies. He works for Vulcan Machines as a VM Architect developing a Java native processor called Moon. See www.KvmGuru.com.
More Stories By Carl Barratt
Carl Barratt works in applications support for Vulcan Machines. He has over seven years of experience in various hardware and software development roles. Carl holds a BEng (Hons) degree in electronic engineering and has undertaken PhD research with the University of Nottingham.
- Kindle 2 vs Nook
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- Confessions of a Ulitzer Addict
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- It's the Java vs. C++ Shootout Revisited!
- Cloud Computing Can Revitalize Your Career as Software Developer
- IBM Could "Reinvent" Java: Mills
- Oracle & Cloud Computing: Exclusive Q&A with SVP Richard Sarwal
- A Brief History of Cloud Computing
- Kindle 2 vs Nook
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- i-Technology Predictions for 2007: Where's It All Headed?








































