Welcome!

Java Authors: Liz McMillan, Walter H. Pinson, III, Maureen O'Gara, Yakov Werde, Tony Bishop

Related Topics: Java

Java: Article

Maximizing Java Performance with Bespoke Programming

When automatic optimization, code analysis, and dynamic recompilation don't work

The bespoke code consists of a variant of the Pigeon-Hole Sort, the code for which is shown in Listing 4.

In a way similar to the counting sort, this routine starts by finding the maximum key value in the input sequence, which it uses to size an array of counts. The input sequence is then examined again and the number of occurrences of each key value is accumulated at the corresponding offset in the counts array. Next, the totals in the counts array are used in a loop to size a ragged array-of-arrays called slots, and as each sub-array is allocated the corresponding count is reset back to zero. This is illustrated in Figure 1.

Then the code spins through the input sequence and inserts a reference to each input element directly into the slots array, using the key value as the first index, and the value in the counts array as the second index. In this loop the counts array has been "re-purposed" and holds the "next index." This is why each of the counts is reset to zero. This stage is illustrated in Figure 2.

Finally the ragged two-dimensional slots array is navigated and the references are read out in sorted order. So the original input sequence (3,6)(1,2)(4,8)(7,1)(1,5) (4,9)(0,4) is sorted into (0,4)(1,2)(1,5)(3,6)(4,8)(4,9)(7,1).

This implementation of the pigeon-hole sort algorithm passes through the input sequence four times, each of which takes a time proportional to O(n). Once to find the maximum value (the value k), a second time to accumulate the counts of individual values, a third time inserting references directly into the sub-arrays, and finally it iterates across all the sub-arrays copying the references back into the input sequence in sorted order. In terms of space, this algorithm allocates two arrays, counts and slots, of size k, and a set of sub-arrays with one slot for each element in the input sequence, a space proportional to O(n). This gives an overall time complexity and memory usage proportional to O(n+k). Like the counting sort described previously, this pigeon-hole sort is a stable sort and sensitive not only to the number of elements sorted, but also to the range of values in the input sequence.

Now we'll show graphs comparing the performance of the three sorting routines for input sequences of ordered pairs of integers with different numbers of elements (n) and different ranges of values (k).

Graph 4 shows the time taken to sort ordered pairs with a range of key values between 0 and 100 (i.e., k=100) for with the number of elements, n between 20,000 and one million.

Graph 5 shows results for the same test but this time with the range of key values between 0 and 10,000 (i.e., k=10,000).

Last, the results with k=1 million. In Graph 6, results show the pigeon-hole sort is consistently faster than both algorithms used in the Java libraries, over all the ranges of n between 20,000 and one million, and all ranges of k between 100 and one million.

The performance of the Java Arrays sort with ordered pairs is significantly worse than when sorting integers, whereas the Collections sort is remarkably consistent, and is actually faster than the Arrays implementation. However, both are much slower than the pigeon-hole sort, particularly with low values of k, where the degree of improvement over the Arrays sort is a between 15 and 17 times faster, and over the Collections sort by a factor of 12 or 13. When the value of k gets larger, in the range of one million, this improvement drops to a factor of three or four, probably because the time taken to allocate memory is becoming significant.

Further efficiency improvements could probably be made by pre-allocating the memory necessary to execute the sort and reusing it between invocations, but the feasibility of this might depend on additional context-specific knowledge.

Conclusion
There are provably an infinite number of ways to implement any given program. If we assume a performance requirement for the shortest execution time, and assume that each instruction executes in a given amount of time, the program with the fewest instructions will execute the quickest. There may be more than one optimal program, and many near-optimal programs.

These optimal programs can either be written entirely by hand, an impractical proposition, or automatic optimization technologies, such as optimizing compilers, can be employed to analyze the program and generate efficient code. Automatic optimization technologies have traditionally analyzed the static structure of the code but modern Java environments are also starting to account for dynamic program behavior by collecting statistics during execution, using techniques such as Escape Analysis, and then dynamically recompiling executing code on-the-fly. In either case, only information that is explicit in the program, or such as may be logically deduced from its behavior, can be utilized by the optimization decision procedure. Some optimizations are impossible or impractical to make automatically, because knowledge specific to the context of use is required, and this information is not explicitly encoded in the program or reliably derivable from its behavior.

For example, take a program that is reading event objects from a socket in batches, and sorting them on a particular integer key field, and then inserting these sorted batches into another in-memory index structure. If this code is written so the key field is read as an integer data type, which is necessary to produce the correct sort order, but these objects are passed to sort routines that are designed to work with arbitrary data types, including strings and floats, then the sort processing is sub-optimal. Conversely, if the events are processed by a sort routine specifically designed to handle integers efficiently, and for a particular range of values, the sort will run significantly faster. If the sort processing constitutes a significant proportion of the overall processing, then this optimization will have a considerable impact on overall system performance and scalability.

In this example, the fact that we are sorting integer sequences with particular values of n and k comprises the specific context of use. We have then designed and written sort routines that utilize this context-specific information to improve the performance compared to generic library code. In our case, it would be very easy to supply this code in a Java library as a sort function overloaded to process integer sequences for third parties to reuse under circumstances similar to our own, but this does not detract from the point. No amount of automatic optimization, code analysis, or dynamic recompilation could produce the same results given the choice of sort routines available in the Java library. Genuinely new code was required, using a novel algorithm and new data structures that exploited knowledge specific to the context of use.

References

More Stories By Adrian Marriott

Adrian Marriott is principal consultant at Progress Software focused on DataXtend and ObjectStore. He has 15 years of industrial experience implementing large, fast, distributed, object-oriented systems using the C++, Java and LISP languages. Adrian has a BA in philosophy from Warwick University and an MSc in artificial intelligence and computer programming from Southbank Polytechnic in London.

Comments (2) View Comments

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Most Recent Comments
jimdiggerson 09/08/09 06:05:00 AM EDT

It is a great step in programming that the tests were run on an isolated system, with no other users or applications operating, to minimize skewing the results.
Programming Assignment