AlgNotes 10

FSU Seal - 1851

COT 5405
Advanced Algorithms
Chris Lacher
Notes 10: Graph Algorithms 2

Minimum Spanning Trees

A weighted graph consists of:
1. G = (V,E) = graph
2. w:E-->Reals, a weighting of edges of G
If H is a subgraph of the weighted graph (G,w), the total weight of H is given by
w(H) = Σ_{e in H}w(e)
A minimum spanning tree (MST) for (G,w) is a subgraph T of G such that:
1. T connects all vertices of G
2. T has no cycles
3. T has minimal weight among all subgraphs satisfing 1 and 2

MST Algorithm Pattern

Assume (G,w) is a connected weighted undirected graph and A is a subgraph of G. A safe edge for A is an edge such that A.Insert(e) is a subset of a MST for (G,w).

MST Algorithm Pattern:
graph A;  // starts out with no edges; becomes an MST
while (A is not a MST for G)
{
  // loop invariant: A is a subset of a MST for G
  find safe edge e for A
  add e to A (along with the vertices of e)
}
return A

Note that when the loop terminates, A is a MST for G. We need to verify the loop invariant by induction. The base case is trivial, and the definition of "safe edge" is an edge e that can be added to A such that A union {e} is a subset of a MST, so the inductive step is embodied in the defintion of safe edge.

Safe Edge Theorem

Theorem: Assume that G = (V,E) is a connected undirected graph with weight function w:E-->Reals. Let A be a subgraph of a MST for G. If (S, V - S) is a cut of G that respects A and e=(u,v) is a mimimum weight edge crossing the cut, then e is safe for A.

Proof: Let T be a MST containing A. If e is in T, by definition e is safe.

Suppose e=(u,v) is not in T. Then, because T spans G, there is a path P in T connecting u and v. Because u and v are on opposite sides of the cut, some edge e' = (x,y) in P also crosses the cut. Let T' be the subgraph obtained by removing e' from T and adding e:

T' = T - {e'} + {e}.

The edge e' connects two components of T - {e'}, one containing u and the other v. Thus adding e reconnects these components, forming a new spanning tree. Moreover,

w(T') = w(T) - w(e') + w(e) <= w(T)

because w(e) is minimal among egdes crossing the cut. Since by assumption w(T) is minimal, w(T') is also minimal, so T' is a MST. Since A + {e} is a subset of T', we have shown that e is safe for A.

Corollary: A connected undirected weighted graph has a minimum spanning tree.

MST Algorithm Pattern - 2

We have now shown that a connected undirected graph has a MST. If the graph has n vertices, then so must every MST. By tree theory, the MST must have n-1 edges. These two facts can be used to simplify the loop structure of the algorithm pattern, as follows:

MST Algorithm Pattern:
graph A;  // starts out with no edges; becomes an MST
int n = number of vertices of G
for (i = 1; i < n; ++i) 
{
  find safe edge e for A
  add e to A (along with the vertices of e)
}
return A

Kruskal and Prim

Two specific MST algorithms depend on the way a safe edge is chosen:

Kruskal's Algorithm: Initialize A as the set of all vertices (no edges), and maintain the property that A is a forrest. The safe edge added is a least-weight edge connecting two distinct components of the forrest. The algorithm terminates when A is connected.
Prim's Algorithm: Initialize A as a single vertex, and maintain the property that A is a tree. The safe edge added is always a least-weight edge connecting A to a vertex not in A. The algorithm terminates when all vertices have been added to A.

To prove Kurskal is correct, apply Safe Edge Theorem with S = vertices of one tree in the forrest.

To prove Prim is correct, apply Safe Edge Theorem with S = vertices of the growing tree.

Detailed Kruskal

// Kruskal's MST Algorithm 
// G = (V,E,w) is a connected weighted undirected graph 
// resources:
int                           n; // number of vertices of G
MinPriorityQueue < EdgeType > E; // edges of G, prioritized by minimum weight
Set < EdgeType >              F; // edges of Kruskal's forrest
Vector < Set < VertexType > > C; // vertices of the forrest, organized by component

// initialization:
for (each edge e of G)  // (1)
  E.Push(e);       // ensures we encounter edges by non-decreasing weight
for (i = 0; i < n; ++i) // (2)
  C[x].Insert(x);  // start with each vertex its own component

// run:
while (!E.Empty())      // (3)
{
  (x,y) = E.Pop();        // (4)
  if (!C[x].Includes(y))  // (5) x and y are not connected in F
  {
    F.Insert(e);   // (6) add e to Kruskal's forrest
    C[x] += C[y];  // (7) union C[y] into C[x], since connected by e
    C[y].Clear();  // (8) make C[y] empty
  }
}
// F = edges of a MST for G

Runtime: O(e log e) = O(e log n) [n = number of vertices, e = number of edges]

n log n (using heap-based PriorityQueue)
n log n (using HBT-based OSet)
runs e times
1. PriorityQueue operation O(log e)
2. Set operation O(log n)
3. Set operation O(log e)
Total of ALL union operation O(n)

Set operation O(log n)

Loop runtime <= O((e + n)(log e + log n)) = O(e log e)
because n <= e+1. Note also that Θ(log n) = Θ(log e), because n - 1 <= e <= n², which gives the alternate statement.

Detailed Prim

class Prim  // minor changes from BFS - could be template parameter
{
private: // reference to structure being searched
  GraphBase& g_; // adjacency list representation (vertices indexed 0,1,...n-1)
  Vertex     s_; // starting search here

private: // control variables
  Vector < ColorType > color_;    
  // Queue < Vertex > conQueue_; 
  MinPriorityQueue < Vertex > conQueue_; // by distance_

public:  // informational variables
  Vector < Vertex > parent_;   // = parent in BFS tree
  Vector < int    > distance_; // = weighted distance from start 

public:  // methods
  void Init()
  {
    for(each vertex v of g_ except possibly s_) // (1)
    {
      color_[v] = white;
      distance_[v] = infinity;
      parent_[v] = NIL;
    }
    color_[s_] = gray;
    distance_[s_] = 0;
    parent_[s_] = NIL;
    conQueue_.MakeEmpty();
    conQueue_.Push(s_);
  }

  void Run()
  {
    while (!conQueue_.Empty())  // (2)
    {
      u = conQueue_.Pop();      // (3)
      for each v in g_.ADJ[u]   // (4)
      {
        if (color_[v] = white)
        {
          color_[v] = gray;
          // distance_[v] = distance_[u] + 1;
          distance_[v] = distance_[u] + w(u,v);
          parent_[v] = u;
          conQueue_.Push(v);    // (5)
        } //  if
      } // for
      color_[u] = black;
    } // while
  }
};

Runtime: O(e log n)

Loop length = Θ(n)
Loop length = Θ(n)
O(log n) operation, TOTAL cost is O(n log n)
Loop runs TOTAL of 2e times
O(log n) operation, TOTAL cost is O(e log n)

Thus the total cost of the entire algorithm is O(n log n + e log n) = O(e log n). One final note: if we use a Fibannaci heap to implement the priority queue, we can improve this runtime to O(e + n log n), which is better for non-sparse graphs [where O(e) > O(n)].