| By Adrian Marriott | Article Rating: |
|
| March 3, 2009 08:45 AM EST | Reads: |
6,843 |
We demonstrate this here with an empirical comparison of the standard Java sort routines with a bespoke implementation designed with a specific context in mind. The first use-case is simply to sort n integers each of which has a positive value between zero and a maximum value k. In Java this can be implemented easily using either the Collections or the Arrays class.
/**
* Sorts the input sequence using Collections sort
*/
public ArrayList<Integer> sortInts(ArrayList<Integer> inputSequence)
{
Collections.sort(inputSequence);
return inputSequence;
} /**
* Sorts an array of integers using the Arrays class
*/
public int[] sortInts(int [] inputSequence)
{
Arrays.sort(inputSequence);
return inputSequence;
}
The Java documentation describes the Collections.sort routine:
"This sort is guaranteed to be stable: equal elements will not be reordered as a result of the sort...The sorting algorithm is a modified mergesort (in which the merge is omitted if the highest element in the low sublist is less than the lowest element in the high sublist). This algorithm offers guaranteed n log(n) performance. This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array. This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place."
The Collections.sort routine has been designed generically to support any element type, and therefore the sort algorithm used cannot exploit the particular features of our case, specifically to sort positive integers. The documentation for the Arrays.sort routine describes a more integer-specific algorithm:
"Sorts the specified array of ints into ascending numerical order. The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function," Software-Practice and Experience, vol. 23(11) pp. 1249-1265 (November 1993). This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance."
The Arrays.sort algorithm has been designed to sort integers, and is much closer to our specific requirements, so we would expect this to run relatively efficiently. However, the O(n log(n)) performance mentioned is higher than the best possible for non-comparative sort routines, so there may be a way to improve on this.
Now, if we consider a sort routine optimized for positive integers, then we can note the following facts true of integers but not necessarily true of any arbitrary element type:
- Integers are not dense on the number line, unlike real or floating point numbers. Given two integers (a, b) that differ by one (a+1 = b), it is impossible to find a third integer (x) between them so that a < x < b.
- There is no associated data with these integers; they are not tuples. So the actual identity of the input elements can be eliminated in the sort process if this leads to an efficiency saving.
The points above act as constraints on our input sequence, and it's possible to exploit them to write a very different sort routine, a variant of the Counting Sort, as shown in Listing 1.
This routine starts by finding the maximum value in the input sequence, which it uses to size an array of counts. The input sequence is then examined again and the number of occurrences of each value is accumulated at the corresponding offset in the counts array. For example, an input sequence of [3,1,4,7,1,4,0] would result in a counts array of length 8, containing the following values [1,2,0,1,2,0,0,1]. Finally the code spins through the counts array and overwrites the inputSequence with the values in sorted order. In our example this gives [0,1,1,3,4,4,7].
This clarifies why this algorithm cannot possibly sort real or floating point numbers; because finding the maximum in an input sequence of floats cannot tell us how large to allocate the counts array, such that there is a slot for each possible float value in the input sequence; nor is there any way of mapping successive floating point numbers to the correct index in the counts array. It's also clear that this counting sort cannot work with tuples with integer keys because the original identity of the elements is destroyed when the last step in the algorithm overwrites the original input elements.
This variant of the counting sort algorithm passes through the input sequence three times: once to find the maximum value (i.e., the value k) and this operation is O(n). It allocates an array of size k and even though this is always a single array allocation this operation is proportional to the size allocated, O(k). It then passes through the input sequence a second time as it accumulates the counts of individual values. Finally it spins through the input sequence overwriting the contents with the sorted values, proportional to O(n). This gives an overall time complexity of O(n+k) and memory usage proportional to O(k). So this counting sort is sensitive not only to the number of elements sorted, but also the range of values in the input sequence.
We show graphs comparing the three sorting routines for input sequences of different sizes and different ranges of values. Note that in every test run all the algorithms had identical pseudo-random number sequences to sort, so all the routines had identical amounts of work to do.
Graph 1 shows the time taken to sort integers with a range of values between 0 and 100 (i.e. k=100) for with the number of elements, n between 20,000 and one million.
Graph 2 show results for the same test but this time with the range of values between 0 and 10,000 (i.e., k=10,000).
Last, the results with k=1 million. These results shown in Graph 3 the counting sort is consistently faster than both the algorithms used in the Java libraries, over all the ranges of n between 20 thousand and one million, and all ranges of n between 100 and one million. The degree of improvement increases as n and k increase, up to a factor of 4.7 faster than the Arrays sort and a factor of 23.8 faster than the Collections sort. However, simply sorting integers isn't particularly useful. Next we expand our example by sorting integer tuples.
Generic Sort versus Pigeon-Hole Sort
Here we relax one of the constraints on our input sequence by allowing arrays of tuples, which means we have to preserve the identity of the elements in the input sequence. In this scenario we need to sort n ordered pairs of integers the first of which acts as a key value between zero and a maximum value k. The second integer in the pair plays the role of ‘associated data' and only serves to require our routine to preserve the identity of each input element.
In Listing 2 we show a Java implementation using the Collections and the Arrays classes, where we see the OrderedPair class, containing two integers, one of which, k, is the key on which sequences of these objects will be sorted. The interface expected by our sort routines is PigeonHoleable.
In Listing 3 we have functions that will sort these ordered pairs using the Collections and Arrays sort methods as before, but this time using a comparator that compares the keys.
Published March 3, 2009 Reads 6,843
Copyright © 2009 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Adrian Marriott
Adrian Marriott is principal consultant at Progress Software focused on DataXtend and ObjectStore. He has 15 years of industrial experience implementing large, fast, distributed, object-oriented systems using the C++, Java and LISP languages. Adrian has a BA in philosophy from Warwick University and an MSc in artificial intelligence and computer programming from Southbank Polytechnic in London.
![]() |
jimdiggerson 09/08/09 06:05:00 AM EDT | |||
It is a great step in programming that the tests were run on an isolated system, with no other users or applications operating, to minimize skewing the results. |
||||
- Kindle 2 vs Nook
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- Confessions of a Ulitzer Addict
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- It's the Java vs. C++ Shootout Revisited!
- Cloud Computing Can Revitalize Your Career as Software Developer
- IBM Could "Reinvent" Java: Mills
- Oracle & Cloud Computing: Exclusive Q&A with SVP Richard Sarwal
- A Brief History of Cloud Computing
- Kindle 2 vs Nook
- Cloud CEOs, CTOs & SVPs to Speak at 4th International Cloud Computing Expo
- Why IBM’s Server Chief Got Busted
- Is Cloud Computing Like Teenage Sex?
- Industry Experts Discuss the State of Cloud Computing
- Performance Tuning Essentials for Java
- The Difference Between Web Hosting and Cloud Computing
- Cloud Computing Expo: Exclusive Q&A with Yahoo! SVP Cloud Computing
- Ajax in RichFaces 3.3, JSF 2 and RichFaces 4
- Confessions of a Ulitzer Addict
- My Thoughts on Ulitzer
- Tactical Cloud Computing Panel at 1st Annual GovIT Expo
- A Cup of AJAX? Nay, Just Regular Java Please
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- JavaServer Faces (JSF) vs Struts
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Bean-Managed Persistence Using a Proxy List
- Reporting Made Easy with JasperReports and Hibernate
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
- Why Do 'Cool Kids' Choose Ruby or PHP to Build Websites Instead of Java?
- i-Technology Predictions for 2007: Where's It All Headed?









































