Sorting is the process of ordering the elements in an array so that they are in ascending order with respect to the elements' keys.
Selection sort orders an array's elements by repeatedly finding the least element in the unsorted segment of the array and exchanging that element with the leftmost element of the unsorted segment. See Section 8.7.3 for a detailed example.
Insertion sort moves each array element to the left until it finds its proper place in the sorted left end of the array. Again, Section 8.7.3 gives details and examples.
An unsorted array is searched by a naive linear search that scans the array elements one by one until the desired element is found. If the array is sorted, we can employ binary search, which brilliantly halves the size of the search space each time it examines one array element. Binary search is described and illustrated in Section 8.7.4.
Binary search is lots faster than linear search. Here are some comparisons:
NUMBER OF ARRAY ELEMENTS EXAMINED: array size | linear search binary search | (avg. case) (worst case) -------------------------------------------------------- 8 | 4 4 128 | 64 8 256 | 128 9 1000 | 500 11 100,000 | 50,000 18
The preceding numbers are so startling that they demand closer scrutiny. The equation that defines the elements examined for linear search is of course:
L(N) = N / 2Read this equation as saying, ``for an array of length N, the number of array lookups needed, L(N), equals N / 2.'' On the average, the desired element will be found in about the middle of the array---N/2 elements will be examined.
Binary search has a markedly different behaviour: In the worst case, the number of lookups, B(n) for an array of length N is defined as
B(N) = 1 + B( N/2 )because searching an array sized N requires one examination of the element in the middle, followed by a binary search of either the left half or the right half of the array, each of which has N/2 elements.
Finally, we note that
B(1) = 1because a one-element array requires just one element to be examined.
The calculations found in the earlier lecture, in Section 8.7.4, show that the above equations define this quantity: That is, pretending that the array's length is a power of 2, that is, N = 2M, for some positive M, we have that
B(2M) = 1 + B(2M-1), for M > 0 B(20) = 1Using a mathematical induction prove, we can show that these equations simplify to merely
B(N) = log2N + 1(Recall that log2N is the base 2 logarithm of N. For example, log28 is 3, because 23 equals 8.)
The above analysis shows that binary search can handle huge sorted arrays remarkably well. This behaviour is typical of divide and conquer algorithms, which repeatedly halve the search space.
Here is a table that provides some intuition about the running speeds of algorithms that belong to these classes:
Logarithmic: Linear: Quadratic: Exponential: array size | log2N N N2 2N -------------------------------------------------------- 8 | 3 8 64 256 128 | 7 128 16,384 3.4*1038 256 | 8 256 65,536 1.15*1077 1000 | 10 1000 1 million 1.07*10301 100,000 | 17 100,000 10 billion ....
Binary search and other divide-and-conquer algorithms have logN time complexity; we say that they have ``order logN'' and sometimes write this as O(logN) (the ``O'' means ``order of''.)
Linear search and the Java compiler (and most compilers) have linear time complexity: O(N)
Selection and insertion sorts have quadratic time complexity: O(N2).
Exhaustive searching and naive game-playing programs (e.g., chess) have exponential time complexity: O(2N). (To get a rough intuition for this time-complexity class, consider a naive sorting algorithm that would generate all possible orderings of the elements in an array and keep those orderings that it is building that remain sorted. This would generate roughly 2N combinations of the array's elements!)
In practice, an on-user becomes impatient waiting on a result from an algorithm that has quadratic time complexity or slower. This poses a problem for sorting algorithms. Fortunately, there are clever variants on sorting that employ divide-and-conquer. These algorithms operate in the range, O(N*log2N) (``order N log N''). Merge sort and quicksort are two elegant examples.
You should see Section 8.7.6 for the details for merge sort and quicksort. But the basic idea is this:
The complexity class, O(N*log2N), of merge sort and quicksort is slower than linear time but faster than quadratic time. In practice, quicksort is the algorithm of choice for sorting arrays whose elements are completely mixed up.