Advanced Heaps, Heap Sort, and Priority Queues
Improving Heap Construction, Heap_Sort Performance, and Priority Queue algorithms1 Heap AlgorithmsThis section discusses the advanced heap algorithms implemented in gheap_advanced.h. The basic heap algorithms push_heap, pop_heap, and the vanilla version of heap_sort are discussed in the COP4530 lecture notes and are implemented in the COP4530 LIB. This discussion assumes that the reader is familier with those materals. 1.1 Heap AlgorithmsBegin with a review of the pop_heap algorithm and its implementing code (default order version): template <class I> void g_pop_heap (I beg, I end) { if (end - beg < 2) return; size_t n = end - beg - 1; size_t i = 0, left, right; bool finished = 0; g_XC(beg[0],beg[n]); do { left = 2*i + 1; right = left + 1; // left and right child nodes if (right < n) // both child nodes exist { if (beg[left] < beg[right]) // ==> follow right subtree { if (beg[i] < beg[right]) { g_XC(beg[right], beg[i]); i = right; } else { finished = 1; } } else // !(beg[left] < beg[right]) ==> follow left subtree { if (beg[i] < beg[left]) { g_XC(beg[left], beg[i]); i = left; } else { finished = 1; } } } else if (left < n) // only the left child node exists { if (beg[i] < beg[left]) { g_XC(beg[left], beg[i]); } finished = 1; // no grandchild nodes exist } else // no child nodes exist { finished = 1; } } while (!finished); } This algorithm consists of a swap of the first and last elements of a presumed heap in the range [0..n] followed by a repair of the smaller heap in the range [0..n-1]. The repair works because the two children of the root are heaps, so the only place where the heap conditions might be violated is at the root. The repair portion of this code is in blue. We can create another function called "repair" using that blue code: void repair (I beg, I end) { if (end - beg < 2) return; size_t n = end - beg - 1; size_t i = 0, left, right; bool finished = 0; do { left = 2*i + 1; right = left + 1; // left and right child nodes if (right < n) // both child nodes exist { if (beg[left] < beg[right]) // ==> follow right subtree { if (beg[i] < beg[right]) { g_XC(beg[right], beg[i]); i = right; } else { finished = 1; } } else // !(beg[left] < beg[right]) ==> follow left subtree { if (beg[i] < beg[left]) { g_XC(beg[left], beg[i]); i = left; } else { finished = 1; } } } else if (left < n) // only the left child node exists { if (beg[i] < beg[left]) { g_XC(beg[left], beg[i]); } finished = 1; // no grandchild nodes exist } else // no child nodes exist { finished = 1; } } while (!finished); } and we can refactor this code into a more compact form as: void repair (I beg, I end) { if (end - beg < 2) return; size_t n,i,left,right,largest; n = end - beg; i = 0; bool finished = 0; do { left = 2*i + 1; right = left + 1; largest = ((left < n && beg[i] < beg[left]) ? left : i); if (right < n && beg[largest] < beg[right]) largest = right; // test order property at i; if bad, swap and repeat if (largest != i) { fsu::g_XC(beg[i],beg[largest]); i = largest; } else finished = 1; } while (!finished); } Be sure to convince yourself that these two blocks of code implement "repair" in exactly the same way. We can take "repair" one step further, to repair any node in the tree under the assumption that its child nodes are heaps. The place where repair is needed is passed in as a third iterator: void repair (I beg, I loc, I end) { if (end - beg < 2) return; size_t n,i,left,right,largest; n = end - beg; i = loc - beg; // only changes in red! bool finished = 0; do { left = 2*i + 1; right = left + 1; largest = ((left < n && beg[i] < beg[left]) ? left : i); if (right < n && beg[largest] < beg[right]) largest = right; // test order property at i; if bad, swap and repeat if (largest != i) { fsu::g_XC(beg[i],beg[largest]); i = largest; } else finished = 1; } while (!finished); } This code defines our new generic algorithm g_heap_repair(I beg, I loc, I end). We can immediately refactor g_pop_heap as follows: void g_pop_heap (I beg, I end) { if (end - beg < 2) return; g_XC(*beg,*(end - 1)); g_heap_repair(beg,beg,end - 1); } Moreover, we can use heap_repair as another way to create a heap from an arbitrary array (or other range): void fsu::g_build_heap (I beg, I end) { size_t size = end - beg; if (size < 2) return; for (size_t i = size/2; i > 0; --i) { g_heap_repair(beg, beg + (i - 1), end); } } The other way to build a heap from scratch is embodied in the first loop in the vanilla version of heap sort: void alt::g_build_heap (I beg, I end) { size_t size = end - beg; if (size < 2) return; for (size_t i = 1; i < size; ++i) { g_push_heap(beg, beg + (i + 1)); } } The remarkable facts are:
These facts are all the more remarkable when you consider that the worst-case runtime for both push_heap and repair_heap are Θ(log n) and the loop of calls is Θ(n) in both cases. We will come back to explaining why these are true later. For now, just contemplate the subtlety. And realize that this provides an opportunity to improve the performance of heap_sort, by substituting fsu::build_heap for the first loop of calls to push_heap. This change does not affect the asymptotic runtime of heap_sort, because the second loop still runs in Θ(n log n). But it certainly improves the algorithm. 1.2 Code OrganizationThe vanilla versions of heap_sort are moved into the namespace alt. Although we no longer us a loop of calls g_push_heap to build a heap, that algorithm is still useful, notably in implementing O(log n) priority queues, so both g_push_heap and g_pop_heap are retained in the fsu namespace. The algorithms are thus organized as follows in gheap_advanced.h: namespace fsu { // using a supplied order predicate: g_push_heap (I beg, I end, P& p); g_pop_heap (I beg, I end, P& p); g_build_heap (I beg, I end, P& p); g_heap_repair (I beg, I loc, I end, P& p); g_heap_sort (I beg, I end, P& p); // using default order: g_push_heap (I beg, I end); g_pop_heap (I beg, I end); g_build_heap (I beg, I end); g_heap_repair (I beg, I loc, I end); g_heap_sort (I beg, I end); } namespace alt { // vanilla versions g_heap_sort (I beg, I end, P& p); g_heap_sort (I beg, I end); } The implementations for heap_sort (default order versions) in gheap_advanced are: namespace alt { void g_heap_sort (I beg, I end) { if (end - beg <= 1) return; size_t size = end - beg; size_t i; for (i = 1; i < size; ++i) { fsu::g_push_heap(beg, beg + (i + 1)); } for (i = size; i > 1; --i) { fsu::g_pop_heap(beg, beg + i); } } } namespace fsu { void g_heap_sort (I beg, I end) { if (end - beg <= 1) return; size_t size = end - beg; size_t i; fsu::g_build_heap(beg, end); for (i = size; i > 1; --i) { g_pop_heap(beg, beg + i); } } } The Θ(n log n) loop of calls to g_push_heap in the alt version is replaced with the Θ(n) call to fsu::g_build_heap(beg, end) in the fsu version. 2 Runtime of the build_heap algorithmsComparing the fsu:: and alt:: versions of build_heap, it is clear they have structural similarity: void fsu::g_build_heap (I beg, I end) { size_t size = end - beg; if (size < 2) return; for (size_t i = size/2; i > 0; --i) // <- loop length Θ(n); { g_heap_repair(beg, beg + (i - 1), end); // <- runtime O(log n); } } void alt::g_build_heap (I beg, I end) { size_t size = end - beg; if (size < 2) return; for (size_t i = 1; i < size; ++i) // <- loop length Θ(n); { g_push_heap(beg, beg + (i + 1)); // <- runtime O(log n) } }
At the end of Section 1.1 above we asserted that the fsu::build_heap algorithm has runtime O(n), and we implied also that the runtime of alt::build_heap is Ω(n log n). Taken together these results are surprising, especially after seeing the structural similarity of the code implementing the two, where each is a loop of Θ(n) of calls to a function whose worst-case runtime is Θ(log n). 2.1 Runtime of fsu::build_heapTo gain some intuition on why fsu::build_heap is asympototically faster that alt::build_heap, notice that the fsu version can be described as follows: For each subtree, starting with the smallest and progressing to the largest, repair the structure to be a heap. This process starts out at the bottom of the tree - i.e., the leaves of the tree, which are by default already heaps. So until we reach a node with a child, there is nothing to repair (which is why we can start the loop at n/2 - the leaves are the nodes with height 0). We first go through all of the nodes with height 1, repairing as we go; then all the nodes with height 2, and so on, until we hit the node with largest height, the root, which has height log2 n = lg n. Notice that 1/2 of the nodes have height 0 with no repair needed. Also 1/2 of the remaining nodes have height 1, so the repair process requires at most one swap. As we get nearer the top of the tree, where the "tall" nodes are, there are far fewer of them to repair. Proposition 1. The asymptotic runtime of fsu::build_heap is ≤ O(n). Proof. Let's say there are N(k) nodes with height k in the tree. Since heap_repair at one of these nodes requires at worst 2k comparisons, the total number of comparisons is no greater than 2 times the sum of k × N(k), the sum taken over all possible heights k:
The number of nodes of height k can be calculated as no greater than the ceiling of n/2k+1. Substituting into the summation yields
A fact from Discrete Math is:
(The sum extends to infinity. See Rosen, Discrete Math, xxx.) Taking a = 1/2, we then have
Extending our sum from 0 to infinity and applying this fact, we have
which verifies that
and the proof is complete. ∎
One final interesting note: Worst-case comp_count for the algorithm has been
established exactly!
in the worst case, where s2(n) is the number of 1's in the binary representation of n and e2(n) is the exponent of 2 in the prime decomposition of n. 2.2 Runtime of alt::build_heapThe opposite conclusion holds for the basic or "alt" version, which builds a heap with a loop of calls to push_heap. Proposition 2. The asymptotic runtime of alt::build_heap is ≥ Ω(n log n). Proof. In the alt::build_heap algorithm, the calls to push_heap on the sub-range [0,k+1] may require lg(k) comparisons, so the entire algorithm may require
The sum on the right-hand side is an approximating sum for the integral
∫log x = x log x
and hence
comp_count ≥ Ω(n log n) and therefore
The naming of this basic build_heap algorithm as "alt::build_heap" is not standard and will require some background exposition when using it with people not associated with this class. 2.3 Navigation optimizationsAnother optimization you may implement in the code is lowering the cost of the control arithmetic used in the various algorithms, where multiplying and dividing by 2 are used to calculate the child and parent indices. This integer arithmetic can be made faster by using the observations:
For examples: left = 2*i + 1; // uses integer arithmetic left = (i << 1) | (size_t)0x01; // uses bitwise operations to get same result parent = i/2; // integer arithmetic parent = (i >> 1); // same result using bitwise operations Because integer arithmetic follows an algorithm quite a few clock cycles may be needed to perform one division or multiplication. The bitwise operations on the other hand have hardware support and may run in as little as one clock cycle. Most likely, the compiler you use will in-line these control optimizations without your help. It's still interesting to do. 2.4 Cormen AlgorithmsThe Cormen text discusses most of the algorithms in the fsu namespace above. The two main differences are:
These are implemented in the library file gheap_cormen.h and are in the namespace cormen. While the recursive implementations are quite different at the source code level, the effects of the algorithms are identical to their counterparts in namespace fsu. 3 Priority QueuesWe have seen above that the heap algorithms facilitate the HeapSort algorithm - an important computing technology. Another well know application, perhaps even more important (since there are alternatives to HeapSort), is facilitating priority queues. A priority queue stores elements of typename T with a priority determined by an object of a predicate class P. The operations are syntactically queue-like but have associative semantics (rather than positional semantics, as in an ordinary queue). The operations for a priority queue PQ<T,P> and informal descriptions of them are as follows: void Push (const T& t) // places t in the priority queue, position unspecified void Pop () // removes the highest priority item from the priority queue T Front () const // returns (usually by const reference) the highest priority item in the priority queue void Clear () // makes the priority queue empty bool Empty () const // returns true iff the priority queue is empty size_t Size () const // returns the number of elements in the priority queue Priority queues are used in several important applications, including:
3.1 Priority Queue Implementation PlansPriority queues are traditionally built as adaptations of other data structures. For example, if s is a container and we prioritize with less-than (so that the larger element has higher priority):
Note that at least one Push/Pop/Front operation in each of implementations 1,2,3,4 has runtime Ω(n). The amortized runtime of an operation over a typical use will also be at least Ω(n). Implementation 5, using BST does meet the goal of logarithmic runtime for Priority Queue operations Push/Pop/Front. However it has superlinear runtime for Init. Morever it also has a heavy footprint and requires copying data into a BST to initialize. Implementation 6 is very light, has linear Init time, logarithmic runtime for Push and Pop, and constant runtime for Front. And because it operates on a simple vector or deque, it can easily operate on a client-owned container with no space overhead. 3.2 Priority Queue Class DesignsTwo possible designs for PriorityQueue class can be useful: a PriorityQueue object pq may own the data itself or it may operate directly on client data: pq is either an entrepreneur or a consultant. The "entrepreneur" version will own a container in which the objects being prioritized are stored. The "consultant" version will have permission to operate on a container already owned by the client. Either of these models can be useful. We will concentrate first on the "consultant" version and then discuss modifications needed for the other. Assume that a client program needs a priority queue operating on data that is stored in a vector v. Consider the adaptor class: template < typename T , class C , class P > class PriorityQueue { typedef T ValueType; typedef C ContainerType; typedef P PredicateType; private: ContainerType & c_; PredicateType & p_; public: PriorityQueue(C& c, P& p) : c_(c), p_(p) {} void Init (); void Push (const ValueType& t); void Pop (); const ValueType& Front () const; void Clear () { c_.Clear(); } bool Empty () const { return c_.Empty(); } size_t Size () const { return c_.Size(); } }; The constructor receives references to the container c and predicate p from the client and stores (non-const) references to each. The operations Clear, Empty, and Size are also implemented, independent of any detailed knowledge of the nature of the container c - only requiring that the same names are implemented in c. Thus only the four operations highlighted in garnet require implementations that are dependent on the nature of c. Exercise 1. For each implementation strategy 1-5 complete a class from the outline above by supplying appropriate code for the four operations Init, Push, Pop, and Front.3.3 Priority Queue ImplementationWe concentrate now on "plan 6" - the heap-based implementation, as described in the table: template < typename T , class C , class P > void PriorityQueue<T,C,P>::Init() { fsu::g_build_heap(c_.Begin(), c_.End(), p_); // <- Θ(n) } template < typename T , class C , class P > void PriorityQueue<T,C,P>::Push (const ValueType& t) { c_.PushBack(t); // <- amortized Θ(1) fsu::g_push_heap(c_.Begin(), c_.End(), p_); // <- Θ(log n) } template < typename T , class C , class P > void PriorityQueue<T,C,P>::Pop () { fsu::g_pop_heap(c_.Begin(), c_.End(), p_); // <- Θ(log n) c_.PopBack(); // <- Θ(1) } template < typename T , class C , class P > const PriorityQueue<T,C,P>::ValueType& Front () const { return c_.Front(); // <- Θ(1) } These implementations allow a client to supply data in any random order in a vector (or deque), along with a predicate object that captures the desired notion of priority, and use the PriorityQueue interface to manipulate data directly in the client-owned vector. Typical uses of PriorityQueue are "behind the scenes" in another algorithm, in similar fashion to the way Partition is used. For example, in the Kruskal Minimum Spanning Tree algorithm operating on an undirected weighted graph, both a Partition object and a PriorityQueue object are used in essential ways to guide the flow of the process. 3.4 Priority Queue Client CodeThe following is sample client code designed to demonstrate how PriorityQueue is set up and also to provide transparancy to the inner workings of the operations. /* fpq.cpp */ typedef uint32_t ElementType; ElementType sentinal = 0; typedef fsu::LessThan < ElementType > PredicateType; // Max PQ // typedef fsu::GreaterThan < ElementType > PredicateType; // Min PQ typedef fsu::Vector < ElementType > ContainerType; int main() { ElementType s; ContainerType v; PredicateType p; fsu::PriorityQueue The declaration of the PriorityQueue object is highlighted and access to the PriorityQueue API is typical of a test/demo harness. There are two cases highlighted that require further explanation. The operations Dump and PopOut are not official components of the PriorotyQueue API. These are added for the dual purposes of development and to provide transparancy into the heap structure. Assuming Init has been called:
Here are implementations of these developer helpers: template < typename T , class C , class P > void PriorityQueue<T,C,P>::Dump (std::ostream& os, int cw) const { size_t n = Size(); size_t i = 0, k = 2; while (i < n) { // one level of tree os << ' '; // indent extra space while (i < n && i < k - 1) { os << std::setw(cw) << c_[i]; ++i; } os << '\n'; k *= 2; } } template < typename T , class C , class P > void PriorityQueue<T,C,P>::PopOut (std::ostream& os, char ofc = '\0') { if (ofc == '\0') { while (!Empty()) { os << Front(); Pop(); } } else { while (!Empty()) { os << ofc << Front(); Pop(); } } } PriorityQueue as above is implemented in LIB/tcpp/pq.h. The test harness (somewhat elaborated) is implemented in LIB/tests/fpq.cpp. Exercise 2. Locate the executable LIB/notes_support/fpq_i.x and copy into a directory where you have execute permissions. Execute fpq_i.x on various data input until you can comfortably predict the outcome of each selection choice for that data. |