Several algorithms are presented, including insertion sort, shell sort,
quicksort. Sorting by insertion is the simplest method, and doesn't require
any additional storage. Shell sort is a simple modification that improves
performance significantly. Probably the most efficient and popular method
is quicksort, and is the method of choice for large arrays. Additional
sorting algorithms include bucket sorting, heap sorts, and radix sorts.
Insertion Sort
One of the simplest methods to sort an array is an insertion sort. An example of an insertion sort occurs in everyday life while playing cards. To sort the cards in your hand you extract a card, shift the remaining cards, and then insert the extracted card in the correct place. This process is repeated until all the cards are in the correct sequence. Both average and worst-case time is O(n2).
The basic step in this algorithm is to insert a data value into a sequence of ordered (increasing) data values in such way that the resulting sequence of data values is also ordered. Thus if the data value D is inserted in position i, then the data values in position j less than i will be less than D, and the data values in position j greater than i will be greater than D.
We will be manipulating the data values in an array. We need to compare all of the values of the data and move them once the ordering relationship between two values are determined. Below is pseudo-code of the algorithm.
Insertion Sort (Sorting the array A[size])
For index i = 2 up to i = size
Assuming there are n elements in the array, we must index through n
- 1 entries. For each entry, we may need to examine and shift up to n -
1 other entries, resulting in a O(n2) algorithm. The insertion sort is
an in-place sort. That is, we sort the array in-place. No extra memory
is required. The insertion sort is also a stable sort. Stable sorts retain
the original ordering when identical elements are present in the input
data.
Shell Sort
Shell sort, developed by Donald L. Shell, is a non-stable in-place sort. It is inspired by the Insertion Sort's ability to work very fast on an array that is almost in order. It is also called diminishing increment sort. Shell sort improves on the efficiency of insertion sort by quickly shifting values to their destination. Average sort time is O(n1.25), while worst-case time is O(n1.5). For further reading, consult Knuth [1998].
Unlike Insertion Sort, Shell Sort does not sort the entire array at once. Instead, it divides the array into noncontiguous segments, which are separately sorted by using Insertion Sort. Once all of the segments are sorted, Shell Sort redivides the array into less segments and repeat the the algorithm until at last that the number of segment equals one, and the segment is sorted.
We begin by doing an insertion sort using a large element spacing.
Next an insertion sort with smaller
spacing is done, etc. Then a sorting with a spacing of
two, and finally a pass is made with a spacing of one.
By using an initial spacing larger than one, we are able to quickly
shift values to their proper destination.
When the swap occurs in a noncontiguous segment, the swap moves the item over a greater distance within the overall array. Insertion Sort only moves the item one position at a time. This means that in Shell Sort, the items being swapped are more likely to be closer to its final position then Insertion Sort. Since the items are more likely to be closer to its final position, the array itself become partially sorted. Thus when the segment number equals one, and Shell Sort is performing basically the Insertion Sort, it will be able to work very fast, since Insertion Sort is fast when the array is almost in order.
Various spacings may be used to implement a shell sort. Typically the
array is sorted with a large
spacing, the spacing reduced, and the array sorted again. On the final
sort, spacing is one. Although
the shell sort is easy to comprehend, formal analysis is difficult.
In particular, optimal spacing values
elude theoreticians. Knuth has experimented with several values and
recommends that spacing h for an
array of size N be based on the following formula:
Let h1 = 1, hs+1 = 3hs + 1, and stop with ht when ht+2 >= N.
Thus, values of h are computed as follows:
h1 = 1
h2 = (3 x 1) + 1 = 4
h3 = (3 x 4) + 1 = 13
h4 = (3 x 13) + 1 = 40
h5 = (3 x 40) + 1 = 121
To sort 100 items we first find an hs such that hs >= 100. For 100 items,
h5 is selected. Our final value
(ht) is two steps lower, or h3. Therefore our sequence of h values
will be 13-4-1.
There are variations of Shell Sort depending on the method of arranging segments. The method that we will be studying here is called "2X". This method determines the number of segments by dividing the number of cells by two (integer division), so that in the first round each segment will have mostly two, and maybe two cells. After the first round, we decrease the number of segments by dividing them by two again. This is to be repeated until there are one segment left (the entire array).
Below is a "2X" Shell Sort's pseudo-code.
Shell Sort (Sorting the array A[size])
Determine the number of segments by dividing the number
of cells by two.
While the number of segments are greater than zero
Heap Sort
By viewing the array as a complete binary tree, Heap Sort transforms such a binary tree into a heap. This algorithm does not require overhead and is not recursive. The algorithm basically follows the following steps:
1. Sort the complete binary tree (actually an array) so that it becomes a max-heap, thus the first element is always the biggest element.
2. Since what we want is exactly the opposite (the last element should be the biggest instead), we swap the first element and the last element.
3. Now we have to re-sort the array (except the last element), so that the first element is again the biggest.
4. Then we repeat the second step, so that the first element is swapped with the current last element.
5. We repeat 2, 3 so that all the elements are sorted.
Such a strategy takes the advantage of binary tree. Every time we move an element, we move it to its current position's child. Thus it moves in a greater distance than Insertion Sort.
Below is the pseudo-code of Heap Sort.
Heap Sort(Sorting array A[size])
For each parent node,
Radix Sort
Unlike most other sorting algorithms, Radix Sort does not involve comparison between the items being sorted. Instead, Radix Sort shuffles the items into small bins, then recollect the bins and repeat the process until the array is sorted.
The magic of Radix Sort lies in finding the key to shuffle the items. For integer data, the keys are each individual digit. In a group of data, there can be up to ten bins for each digit (0 - 9). Thus we isolate each individual digit of each data, and place into the corresponding bin. We start with the least significant digit and work our way up to the most significant digit.
Below is pseudo-code of the Radix Sort.
Radix Sort (Sorting array A[size])
Create all of the bins.
From the least significant digit to the most significant
digit
Bucket Sort
The bucket sort is an extreme version of the radix sort, where the number
of bins is equal to the number of possible data values. The bucket
sort is an O(n) algorithm.
Quicksort
Although the shell sort algorithm is significantly better than insertion
sort, there is still room for
improvement. One of the most popular sorting algorithms is the quicksort.
Quicksort executes in O(n log n)
on average, and O(n2) in the worst-case. However, with proper precautions,
worst-case behavior is
very unlikely. Quicksort is a non-stable sort. It is not an in-place
sort as stack space is required. For
further reading, consult Cormen [1990].
The quicksort algorithm works by partitioning the array to be sorted,
then recursively sorting each
partition. In the Partition function, one of the array elements
is selected as a pivot value. Values smaller
than the pivot value are placed to the left of the pivot, while larger
values are placed to the right.
int function Partition (Array A, int
Lb, int Ub);
begin
select a pivot from A[Lb]...A[Ub];
reorder A[Lb]...A[Ub] such
that:
all values to
the left of the pivot are <= pivot
all values to
the right of the pivot are >= pivot
return (pivot position);
end;
procedure QuickSort (Array A, int Lb,
int Ub);
begin
if Lb < Ub then
M = Partition
(A, Lb, Ub);
QuickSort (A,
Lb, M - 1);
QuickSort (A,
M + 1, Ub);
end;
One index starts on the left and selects an element that is larger than
the pivot, while another index starts on the
right and selects an element that is smaller than the pivot.
These elements are then exchanged. This process repeats until all elements
to the left of the pivot <= the pivot, and all elements to the right
of the pivot are >= the pivot.
QuickSort recursively sorts the two sub arrays.
As the process proceeds, it may be necessary to move the pivot so that correct ordering is maintained. In this manner, QuickSort succeeds in sorting the array. If we're lucky the pivot selected will be the median of all values, equally dividing the array. Let's assume that this is the case. Since the array is split in half at each step, and Partition must eventually examine all n elements, the run time is O(n log n).
To find a pivot value, Partition could simply select the first element
(A[Lb]). All other values would be compared to the pivot value, and placed
either to the left or right of the pivot as appropriate. However, there
is one case that fails miserably. Suppose the array was originally in order.
Partition would always select the lowest value as a pivot and split the
array with one element in the left partition, and Ub - Lb elements in the
other. Each recursive call to quicksort would only diminish the size of
the array to be sorted by one. Therefore n recursive calls would be required
to do the sort, resulting in a O(n2) run
time. One solution to this problem is to randomly select an item as
a pivot. This would make it extremely unlikely that worst-case behavior
would occur.
Below is another pseudo-code for Quick Sort.
Quick Sort (Sorting array A[size])
While Low is less than High
There are several factors that influence the choice of a sorting algorithm:
Stable sort
Recall that a stable sort will leave identical keys in the same relative
position in the sorted output. Insertion sort is the only algorithm covered
that is stable.
Space
An in-place sort does not require any extra space to accomplish its
task. Both insertion sort and shell sort are in- place sorts. Quicksort
requires stack space for recursion, and therefore is not an in-place sort.
Time
The time required to sort a data set can easily become astronomical.
The table below shows the relative timings for each method. The time required
to sort a randomly ordered data set is also shown below.
Simplicity
Simpler algorithms can result in fewer programming errors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|