4 Search TreesRecall the several definitions from discrete mathematics. A undirected graph G = (V,E) is connected iff for any two vertices x, y ∈ V there is a path in G from x to y. A component of G is a maximal connected subgraph of G. G is a tree iff G is connected and contains no cycles. G is a forest if each component of G is a tree. Theorem 4 (Characterization of Trees). The following are equivalent statements about a graph G= (V,E):
(See your discrete math text for proofs.) 4.1 BFS and DFS TreesRecall that both BFSurvey and DFSurvey collect parent data during the course of the algorithm, storing that information in the vector parent_[]. Assume either DFS or BFS context, and suppose we have run Reset() and then a single call Search(v) for some vertex v. Let's also shorten the name parent_ to p, and define:
Lemma 1 (Tree Lemma). (B(v),P(v)) is a tree with root v. Proof. First note that v is the unique vertex in B(v) with null parent. Also note that for any black vertex x other than v, (p(x),x) is an edge in the graph (directed from p(x) to x). Following the parent vertices until a null parent is reached defines a (directed) path from v to x. Now count the vertices and edges: for each black vertex x other than v, the edge (p(x),x) is distinct from any other (p(y),y) because x ≠ y. Therefore we have a 1-1 correspondence between black vertices not equal to v and edges. Thus (B(v),P(v)) is a connected graph with vertexSize = 1 + edgeSize. Such a graph must be a tree, by statement (4) of the Tree Characterization Theorem. ∎ We call (B(v),P(v)) the search tree generated by the search starting at v. Now assume we have done a full survey with a call Search(), and define
Lemma 2 (Forest Lemma). (V,P) is a forest whose trees are all the search trees generated during the survey. Proof. By the Tree Lemma, T(v) = (B(v),P(v)) is a tree for each starting vertex v. Suppose some edge in P connects two of these trees, say T(v1) and T(v2). The edge must necessarily be of the form (p(x), x) for some x, where x ∈ B(v1) and p(x) ∈ B(v2). But then the parent-path from x will pass through p(x) to v2, which means that v2 should have been discovered by XFSurvey::Search(v1). The contradiction means that no edge connects T(v1) and T(v2) . Therefore the search trees T(v) represent the components of (V,P), the definition of forest. ∎ We call (V,P) the search forest generated by the survey. 4.2 Interpreting BFSurveyWe have alluded to the shortest path property of BFS in previous sections. It is time make full contact with a proof, and we devote Section 4.2 to doing that. We follow the proof in [Cormen et al 3e]. For any two vertices x,y in G, define the shortest-path-distance from x to y to be
Lemma 3. δ(x,y) = 1 iff x and y are connected by an edge e ∈ E. Proof. Since x ≠ y, the distance must be at least 1. The edge e is a path with length 1, so the shortest path distance is no greater than 1. Conversely, if the shortest path distance is 1 then such a path consists of a single edge connecting the two vertices. ∎ Lemma 4 (Triangle Inequality). Let G=(V,E) be a directed or undirected graph and x,y,z ∈ V. If y is reachable from x and z is reachable from y then
Proof. First note that z is reachable from x by concatenating shortest paths from x to y and y to z. This path from x through y to z has length exactly δ(x,y) + δ(y,z). The shortest path from x to z can be no longer than this path through y. Therefore δ(x,z) ≤ δ(x,y) + δ(y,z). ∎ Assumptions. For the remainder of this section, let G=(V,E) be a directed or undirected graph and suppose BFSurvey::Reset() and BFSurvey::Search(v) have been called for some starting vertex v ∈ V. Let d(x) denote the calculated value distance_[x] for each x ∈ V. Lemma 5. The path in the search tree subgraph (B(v),P(v)) from v to x has length d(x). Proof. In the search tree there is only one path from v to x. Clearly the path from v to itself has length 0 = d(v), verifying a base case. Assume the lemma is true for d(x) ≤ k and let x be a vertex with d(x) = k + 1. The unique path to x consists of a path of length k plus one extra edge of the form (p(x),x). By the induction hypothesis, the path from v to p(x) has length k = d(p(x)). By inspection of the algorithm, d(x) = d(p(x)) + 1 = k + 1, verifying the inductive step. Therefore by the principle of mathematical induction the result is proved. ∎ Corollary. For each vertex x, d(x) ≥ δ(v,x). Lemma 6. At any point in the run of the algorithm, consider the gray vertices, that is, the vertices in the control queue, and the front vertex f. The values d(x) are non-decreasing in queue order for the gray vertices, and moreover are either constant (equal to d(f) or have two values d(f) and d(f) + 1. Proof. Examine the code to see that when x is pushed onto the control queue, d(x) = d(p(x)) + 1 (and at that time p(x) is at front of the queue). Then: Show by mathematical induction that d values are non-decreasing for all vertices in the queue. Because d values are never changed once a vertex is pushed, if x is pushed before y then d(x) ≤ d(y). ∎ Corrolary. If x and y are both gray vertices (i.e., in the control queue) with x colored gray before y (i.e., x pushed before y), then d(x) ≤ d(y) ≤ d(x) + 1. Lemma 7. If x and y are reachable from v and x is discovered before y, then d(x) ≤ d(y). Proof. Examine the code to see that when x is pushed onto the control queue, d(x) = d(p(x)) + 1 (and at that time p(x) is the front of the queue). Show by mathematical induction that d values are non-decreasing for all vertices in the queue. Because d values are never changed once a vertex is pushed, if x is pushed before y then d(x) ≤ d(y). ∎ Lemma 8. d(x) = δ(v,x) for all reachable x. Proof. Suppose that the result fails. Let δ be the smallest shortest-path-distance for which the result fails, and let y be a vertex for which d(y) > δ(v,y) = δ. Let x be the next-to-last vertex on a shortest path from v to y. Then δ(v,y) = 1 + δ(v,x) and, because of the minimality of δ = δ(v,y), d(x) = δ(v,x). We summarize what we know so far:
Now consider the three possible colors of y at the times x is at the front of the control queue. If y is white, then y will be pushed onto conQ while x is at the front, making d(y) = d(x) + 1, a contradiction. If y is black, it has been popped and d(y) ≤ d(x) by Lemma 7, again a contradiction. If y is gray, then d(y) ≤ d(x) + 1 by Lemma 5, a contradiction yet again. Therefore under all possibilities our original assumption of failure is false. ∎ Putting these facts together we have: Theorem 5 (Breadth-First Tree Theorem). Suppose BFSurvey::Search(s) has been called for the graph or digraph G=(V,E). Then For each vertex x that is reachable from s, the "parent path" from s to x in the breadth-first tree is a shortest path from s to x. 4.3 Interpreting DFSurveyWe have already remarked that where BFS focuses on distance, DFS is more about time. We also took care to ensure that the time stamps on vertices during a DFSurvey::Search() are unique, so that one time stamp is used for each change of color of a vertex. These time stamps provide a way to codify the effects of LIFO order in the control system for DFS. We will make use of the more compact mathematical notations
for each vertex x. Inspection of the DFS algorithm shows that discovery occurs before finishing: Lemma 9. For each vertex x, td(x) < tf(x). Therefore the interval [td(x),tf(x)] represents the time values for which x is in the control LIFO, that is, the times when x has color gray. Prior to td(x), x is white, and after tf(x), x is black. Theorem 6 (Parenthesis Theorem). Assume G = (V,E) is a (directed or undirected) graph and that DFSurvey::Search() has been run on G. Then for two vertices x and y, exactly one of the following three conditions holds:
Proof. First suppose x and y belong to different trees in the DFS forest. Then x is discovered during one call Search(v) and y is discovered during a different call Search(w) where v ≠ w. Then x is colored gray and then black during Search(v), and y is colored gray and then black during Search(w). Clearly these two processes do not overlap in time, and condition (1) holds. Suppose on the other hand that x and y are in the same tree in the search forest. Without loss of generality we assume y is a descendant of x. Then, by inspection of the algorithm, x must be colored gray before y. Hence, td(x) < td(y). But due to the LIFO order of processing, this means that y is colored black before x. Therefore tf(y) < tf(x). That is, [td(y),tf(y)] is a subset of [td(x),tf(x)], and condition (3) holds. A symmetric argument completes the proof. ∎ Theorem 7 (White Path Theorem). In a depth-first forest of a directed or undirected graph G=(V,E), vertex y is a descendant of vertex x iff at the discovery time td(x) there is a path from x to y consisting entirely of white vertices. Proof. First note that discovery time td(x) = dtime[x] is stamped prior to any processing of x in the DFSurvey::Search algorithm. Suppose z is a descendant of x. If z = x then {x} is a white path. If z ≠ x then td(x) < td(z) by the Parenthesis Theorem, so z is white at time td(x). Applying the observation to any y in the DFS tree path from x to z shows that the DFS tree path from x to z consists of white vertices. Conversely, suppose at time td(x) there is a path from x to z consisting entirely of white vertices. If some vertex in this path is not a descendant of x, let y be the one closest to x with this property. Then the predecessor p on the path is a descendant of x. At time td(p), y is white and an unvisited adjacent of p, so y will be discovered and p(y) = p. That is, y is a descendant of p, and hence of x, contradicting the assumption that y is not a descendant of x. Therefore every vertex on the white path is a descendant of x. ∎ 4.4 Classification of EdgesThe surveys can be used to classify edges of a graph or directed graph. We will use DFSurvey for this purpose. Given an edge, there are four possibilities: (1) it is in the DFS Forest; it goes from x to another vertex in the same tree, either (2) an ancester or (3) a descendant; or (4) it goes to a vertex that is neither ancester nor descendant, whether in the same or a different tree.
For an undirected graph, this classification is based on the first encounter of the edge in the DFSurvey. Note these observations relating the color of the terminal vertex of an edge to the edge classification. Suppose e = (x,y) is an edge of G, and consider the instant in algorithmic time when e is explored. Then:
Theorem 8. In a depth-first survey of an undirected graph G, every edge is either a tree edge or a back edge. Proof. Let e = (x,y) be an edge of G. Since G is undirected, e is as well, so we can assume that x is discovered before y. At time td(x), y is white. Suppose e is first explored from x. Then y is white at the time, and hence e becomes a tree edge. If e is first explored from y, then x is gray at the time, and e is a back edge. ∎ Theorem 9. A directed graph D contains no directed cycles iff a depth-first search of D yields no back edges. Proof. If DFS produces a back edge (x,y), adding that edge to the DFS tree path from x to y creates a cycle. If D has a (directed) cycle C, let y be the first vertex discovered in C, and let (x,y) be the preceding edge in C. At time t_d(y), the vertices of C form a white path from y to x. By the white path theorem, x is a descendant of y, so (x,y) is a back edge. ∎ 5 Spinoffs from BFS and DFSIf theorems have corollaries, do algorithms have cororithms? Maybe, but that is difficult to speak. "Spinoff" is very informal term meaning an extra outcome or simple modification of the algorithm that requires little or no extra verification or anaylsis. 5.1 Components of a GraphSuppose G = (V,E) is an undirected graph. G is called connected iff for every pair x,y ∈ V of vertices there is a path in G from x to y. A component of G is a graph C such that
The technology developed in Sections 3 and 4 shows that the following instantiation of the DFS algorithm produces a Vector<N> component such that component[x] is the component containing x for each vertex x of G. All that is needed is to declare the component vector and make a small post-processing adjustment to DFSurvey::Search(): void DFSurvey::Search() { unsigned components = 0; for (each vertex v of g_) if (color[v] == white) { components +=1; Search(v); } component[f] = components; } Recall that we know the DFS forest is a collection of trees, each tree generated by a call to Search(v). The DFS trees are in 1-1 correspondence to the components of G. The algorithm above counts the components and assigns each vertex its component number as it is processed. This is an algorithm that runs in time Θ(|V| + |E|) and results in a mechanism for constant-time lookup of the component of any vertex. 5.2 Topological SortA directed graph is acyclic if it has no (directed) cycles. A directed acyclic graph is called a DAG for short. DAGs occur naturally in various places, such as:
In these and other models, it is important to know what order to consider the vertices. For example, courses need to be taken respecting the pre-requisit structure, make needs to build the targets in some order constrained by the dependencies, and a spreadsheet cell should be calculated only after the cells on which it depends have been calculated. A topological sort of a directed graph G is an ordering of its vertices in such a way that all edges go from lower to higher vertices in the ordering: for each edge (x,y) in G, x < y in the ordering. Theorem 10. A directed graph G has a topological sort if and only if G has no directed cycles. Proof. Suppose G has a topological sort. If G had a (directed) cycle { x1, x2, ..., xk = x1 } then we would have x1 < x2 < ... < xk = x1, that is, x1 < x1, an impossibility. If on the other hand G is acyclic, either of the two algorithms below constructs a topological sort for G. ∎
Theorem 11. Suppose G is a directed graph and that a complete depth-first survey is performed on G. Then the reversed post-ordering of the vertices is a topological sort of G if and only if G has no (directed) cycles. Proof. Note that a postorder is the finishing order of the vertices. First suppose G has a topological sort. Then, because all edges point forward in the sort order, there can be no cycle. Next suppose G is a DAG, let e = (x,y) be an edge in G, and consider the moment that e is explored during DFSurvey: x is at the top of the control stack with color gray. Look at the cases:
Thus in all cases y is finished before x and hence preceeds x in postorder. In reverse postorder, y > x. ∎ Here is the program referred to in the Theorem 10. template <class DigraphType, class ResultType> void TopSort (const DigraphType& g, ResultType& outQueue) { fsu::DFSurvey <DigraphType> dfs(g); fsu::List<DigraphType::Vertex> postorder; typename fsu::List::Iterator i; dfs.Search(); PostOrder(dfs,postorder); for (i = postorder.rBegin(); i != postorder.rEnd(); --i) { outQueue.Push(*i); } } To detect whether there is a cycle in G, modify DFSurvey to detect back edges during Search(v) and return false if one is found. Exercises
template <class DigraphType, class ResultType> bool TopSort2 (const DigraphType& diGraph, ResultType& outQueue) { typedef typename DigraphType::Vertex Vertex; typedef typename DigraphType::AdjIterator AdjIterator; fsu::Queue < Vertex > conQueue; // conQueue stores current source vertices prior to processing fsu::Vector < Vertex > inDegree(diGraph.VrtxSize(),0); // current in-degree of each vertex // preprocess to get all InDegrees (more efficient than n calls to InDegree) for (Vertex v = 0; v < diGraph.VrtxSize(); ++v) { for (AdjIterator i = diGraph.Begin(v); i != diGraph.End(v); ++i) { ++inDegree[(size_t)*i]; } } // initialize conQueue for (v = 0; v < diGraph.VrtxSize(); ++v) { if (inDegree[v] == 0) { conQueue.Push(v); } } // main algorithm while (!conQueue.Empty()) { Vertex v = conQueue.Front(); conQueue.Pop(); outQueue.Push(v); for (AdjIterator i = diGraph.Begin(v); i != diGraph.End(v); ++i) { --inDegree[*i]; if (inDegree[*i] == 0) conQueue.Push(*i); } } // end while // report result if (outQueue.Size() != diGraph.VrtxSize()) return 0; return 1; } // TopSort2
Software Engineering Projects
|