AlgNotes 3

FSU Seal - 1851

COT 5410-01 Fall 2004
Algorithms
Chris Lacher
Notes 3: Sorting and Related Topics

Sorting Problem

Input: Sequence of n numbers or keys (a₁, a₂, ... , a_n)
Output: Permutation (a'₁, a'₂, ... , a'_n) such that a'₁ <= a'₂ <= ... <= a'_n
Satellite data associated with keys may be considerable
Implementation "details":

Assume that (!(key1 < key2) and !(key2 < key1)) implies (key1 == key2), but satellite data may not be the same
May use indirection to avoid many re-assignments of satellite data
Data assumed in random access storage (e.g., array); sorting algorithm may require copy into such a structure

Insertion Sort

Insert-Sort ( array of numbers A )
{
  for (j = 2; j <= length(A); ++j)
  {
    // Loop Invariant: A[1..j-1] is sorted
    key = A[j];
    i = j - 1;
    while (i > 0 and A[i] > key)
    {
      A[i+1] = A[i];
      i = i-1;
    }
    A[i+1] = key;
  }
  return;
}

Proof of Halting (done)
Proof of Correctness (done)
Runtime Analysis (done), Result: Worst case = Average Case = Θ(n²)
Runspace Analysis (done), Result: in-place

Simple Sort

inline Swap (key& x, key& y)
{
  key z = x;
  x = y;
  y = z;
}

SimpleSort (array A[1..n])
// pre: 
// post: A[1..n] is sorted
{
  for (i = 1; i <= n; ++i)
  {
    k = i;
    for (j = i + 1; j <= n; ++j)
    {
      if (A[j] < A[k])
        k = j;
    }
    Swap(A[k], A[i]);
  }
}

Proof of Halting
Proof of Correctness
Runtime Analysis
Runspace Analysis
Is this sort stable? If not, modify algorithm so its is stable?

Merge Sort

Merge(array A, index p, index q, index r)
// Pre:  A[p..q] and A[q+1..r] are sorted ranges
// Post: A[p..r] equals merge of A[p..q] and A[q+1..r]
{
  n1 = q - p + 1;
  n2 = r - q;
  new array L[1 .. n1 + 1]
  new array R[1 .. n2 + 1]
  for (i = 1; i <= n1; ++i)
    L[i] = A[p + i - 1];
  for (j = 1; j <= n2; ++j)
    R[j] = A[q + j];
  L[n1 + 1] = infinity
  R[n2 + 1] = infinity
  i = 1;
  j = 1;
  for (k = p; k <= r; ++k)
  {
    if (L[i] <= R[j])
    {
      A[k] = L[i];
      ++i;
    }
    else
    {
      A[k] = R[j];
      ++j;
    }
  } // end for
}

MergeSort (array A, index p, index r)
// Pre:  p and r are in the range of A, p <= r
// Post: A[p..r] is sorted
{
  if (p < r)
  {
    q = (p + r)/2;
    MergeSort (A, p, q);
    MergeSort (A, q + 1, r);
    Merge (A, p, q, r);
  }
}

Proof of Halting
Proof of Correctness
Runtime Analysis
Runspace Analysis

HeapSort

Binary tree model

Parent of A[k] is A[k/2]
Left child of A[k] is A[2k]
Right child of A[k] is A[2k+1]

A max heap is an array in which every parent is greater than or equal to its children, using the tree model described above. (This is also called the partially ordered tree (POT) property.)

The Push Heap Algorithm

Add new data at next leaf
Repair upward
Repeat

locate parent
if POT not satisfied
swap
else
stop

Until POT

Push Heap

push_heap (array A, index p, index r)
// pre:  A[p..r-1] is a max heap
//       r is a valid index for A
// post: Elements of A[p..r] are permuted
//       A[p..r] is a max heap
{
  
}

The Pop Heap Algorithm

Swap last leaf and root
"Remove" last leaf
Repair downward
Repeat

identify children
find larger child
if POT not satisfied
swap
else
stop

Until POT

Pop Heap

pop_heap (array A, index p, index r)
// pre:  A[p..r] is a max heap
// post: A[p], A[r] have swapped values
//       Elements of A[p..r] are permuted
//       A[p..r-1] is a max heap
{
  
}

heap_sort (array A, index p index r)
// pre:  p and r are valid index values for A
// post: A[p .. r] is sorted
{
  for (i = p+1; i <= r; ++i)
    push_heap(A,p,i)
  for (i = r; i > p; --i)
    pop_heap(A,p,i)
}

Proof of Halting
Proof of Correctness
Runtime Analysis
Runspace Analysis
Recursive version?

QuickSort

QuickSort (array A, index p, index r)
{
  if (p < r)
  {
    q = Partition(A, p, r)
    QuickSort(A, p, q-1)
    QuickSort(A, q+1, r)
  }
}

Partition (array A, index p, index r)
{
  x = A[r]
  i = p - 1
  for (j = p .. r-1)
  {
    if (A[j] <= x)
    {
      i = i+1
      Swap(A[i], A[j])
    }
  }
  Swap(A[i+1], A[r])
  return i + 1
}

Proof of Halting
Proof of Correctness
Worst Case Runtime O(n²)
Average Case Runtime Θ(n log n)

At most n calls to Partition during entire run, because each execution of loop body eliminates one (pivot) element from consideration
Each call to Partition makes one comparison between elements
Therefore: Run time is O(n + X), where
X = number of comparisons made during entire execution of Quicksort
Pairs of elements are compared at most once
Therefore: X = Σ_i=1^n-1 Σ_j=i+1ⁿ X_ij where X_ij = I { z_i is compared to z_j }
E[X] = Σ_i=1^n-1 Σ_j=i+1ⁿ Pr { z_i is compared to z_j }
Assume z₁, ..., z_n is the sorted permutation of input. Then:
Pr { z_i is compared to z_j } = 2/(j - i + 1)
E(X) = O(n log n)
Therefore: Run time is O(n + n log n) = O(n log n)

Runspace Analysis

Theoretical Limits

Theorem: Any comparison sort algorithm has worst case runtime Ω(n log n).

Path in decision tree from root to leaf represents one particular sort instance
Therefore: Worst case runtime = Ω(H)
Decision tree must have each possible permutation of input as a reachable leaf
Therefore: Decision tree has L >= n! leaves
n! <= L <= 2^H
log n! <= H
H >= Ω (log n!) >= Ω (n log n) (by "formula")