Graphs 1: Representation, Search, and Survey1 Definitions, Notation, and RepresentationsGraphs and directed graphs are studied extensively in the pre-requisit math sequence Discrete Mathematics I [FSU::MAD2104], so these notes will not dwell on details. We will however mention key concepts in order to establish notation, especially where there may be slight variations in terminology. Generally we denote a graph (directed or undirected) by G =(V,E). 1.1 Undirected Graphs - Theory and Terminology
Theorem 1u. In an undirected graph, ∑v ∈ V deg(v) = 2 × |E|. 1.2 Directed Graphs - Theory and Terminology
There are subtle distinctions in the way graphs and digraphs are treated in definitions. There are also "conversions" from graph to digraph and digraph to graph.
Theorem 1d. In a directed graph, ∑v ∈ V inDeg(v) = ∑v ∈ V outDeg(v) = |E|. The proof of Theorem 1 uses aggregate analysis. First show the result for directed graphs, where edges are in 1-1 correspondence with their initiating vertices. Then apply 1d to the directed version of an undirected graph and adjust for the double counting. ∎ 1.3 Graph and Digraph RepresentationsThere are two fundamental families of representation methods for graphs - connectivity matrices and adjacency sets. Two archetypical examples apply to graphs and digraphs G = (V,E) with the assumption that the vertices are numbered starting with 0 (in C fashion): V = {v0, v1 , v2, ... ,vn-1}. 1.3.1 Adjacency Matrix RepresentationIn the adjacency matrix representation we define an n × n matrix M by
M is the adjacency matrix representation of G. For example, consider the graph G1 depicted here: 0 --- 1 2 --- 6 --- 7 Graph G1 | | | | | | 3 --- 4 --- 5 --- 8 --- 9 The adjacency matrix for G1 is:
(where we have used "-" instead of "0" for readability). Adjacency matrix representations are very handy for some graph contexts. They are easy to define, low in overhead, and there is not a lot of data structure programming involved. And some questions have very quick answers:
Adjacency matrix representations suffer some disadvantages as well, related to the fact that an n × n matrix has n2 places to store and visit:
Particularly when graphs are sparse - a somewhat loose term implying that the number of edges is much smaller than it might be - these basic storage and traversal tasks should not be so expensive. 1.3.2 Sparse GraphsA graph (or digraph) could in principle have an edge from any vertex to any other. (We don't allow self-edges in undirected graphs without specifically warning the context that self-edges are allowed.) In other words the adjacency matrix representation might have 1's almost anywhere, out of a total of n2 possibilities. In practice, most large graphs encountered in real life are quite sparse, with vertex degree significantly limited, so that the number of edges is O(n) and the degree of each vertex may even be bounded by a constant. (For example, Euler's Formula on planar graphs implies that if G is planar then |E| ≤ 3|V| - 6, hence |E| = O(|V|).) Observe:
So, a graph with, say, 10,000 vertices, and vertex degree bounded by 100, would have no more than 1,000,000 edges, whereas the adjacency matrix would have storage for 100,000,000, with 99,000,000 of these representing "no edge". The human brain can be modeled loosely as a directed graph with vertices representing neuronal cells and edges representing direct communication from one neuron to another. This digraph is estimated to have about 1010 vertices with average degree about 103. An adjacency matrix for this model has 1010×2 = 1020 entries with at most 1013 "1" entries, thus wasting at least
units of memory. 1.3.3 Adjacency List RepresentationThe adjacency list representation is designed to require storage to represent edges only when they actually exist in the graph. There is per-edge overhead required, but for sparse graphs this overhead is negligible compared to the space used by adjacency matrices to store "no info". The adjacency list uses a vector v of lists reminiscent of the supporting structure for hash tables. The vector is indexed in the range [0..n), and the list v[i] consists of the subscripts of vertices that are adjacent from vertex vi. On the other hand, unlike the situation with hash tables where we could choose the size of the vector to maintain the average list size to approximately 1, we have no such luxury in representing graphs. The index range for the vector represents exactly the vertices of the graph. For example, the adjacency list representation of the graph G1 illustrated above is: v[0]: 1 , 3 v[1]: 0 0 --- 1 2 --- 6 --- 7 v[2]: 5 , 6 | | | v[3]: 0 , 4 | | | v[4]: 3 , 5 3 --- 4 --- 5 --- 8 --- 9 v[5]: 2 , 4 , 8 Graph G1 v[6]: 2 , 7 v[7]: 6 , 9 v[8]: 5 , 9 v[9]: 7 , 8 As with the matrix representation, time and space requirements for adjacency list representations are based on what needs to be traversed and stored:
Proofs of the last two factoids use aggregate analysis. Note that when G has many edges, for example when |E| = Θ(|V|2), the estimates are the same as for matrix representations. But for sparse graphs, in particular when |E| = O(|V|), the estimates are dramatically better - linear in the number of vertices. 1.3.4 Undirected v. Directed GraphsBoth the adjacency matrix and adjacency list can represent either directed or undirected graphs. And in both systems, an undirected edge has a somewhat redundant representation. An adjacency matrix M represents an undirected graph when it is symmetric about the main diagonal:
One could think of this constraint as a test whether the underlying graph is directed or not. An adjacency list v[] represents an undirected graph when all edges (x,y) appear twice in the lists - once making y adjacent from x and once making x adjacent from y: ... v[x]: ... , y , ... ... v[y]: ... , x , ... ... 1.3.5 Variations on Adjacency ListsMany graph algorithms require examining all vertices adjacent from a given vertex. For such algorithms, traversals of the adjacency lists will be required. These traversals are inherently Ω(list.size), and replacing list with a faster access time set structure might even slow down the traversal (although not affecting its asymptotic runtime). If a graph process requires many direct queries of the form "is y adjacent from x" (as distinct from traversing the entire adjacency list), a question that requires Ω(d) time to answer using sequential search in a list of size d, it can be advantageous to replace list with a faster access set structure, which would allow the question to be answered in time O(log d). (In the graph context, d is the size of the adjacency list and the outDegree of the vertex whose adjacency list is being searched.) 1.3.6 Dealing with Graph DataOften applications require maintaining data associated with vertices and/or edges. Vertex data is easily maintained using auxilliary vectors. (An associative array can be used to map vertex data back to vertex number.) Edge data in an adjacency matrix representation is also straightforward to maintain in another matrix. Edge data is slightly more difficuly to maintain in an adjacency list representation, in essence because the edges are only implied by the representation. This problem can be handled in two ways. Adjacency lists can be replaced with edge lists - instead of listing adjacent vertices, list pointers to the edges that connect the adjacent pairs. Then an edge can be as elaborate as needed. It must at minimum know what its two vertex ends are, of course, something like this: template <class T> struct Edge { T data_; // whatever needs to be stored unsigned from_, to_; // vertices of edge }; Or a hash table can be set up to store edge data based on keys of the form (x,y), where (x,y) is the edge implied by finding y in the list v[x]. 1.4 A Possible Graph FrameworkWe now have four distinct representations that might be useful in actual code: adjacency matrix and adjacency list representations for both undirected and directed graphs. The following class hierarchy provides a roadmap for defining a framework providing all four representations, the advantages being enforcement of terminology uniformity and re-using code by elevating to a parent class where appropriate. Even though we will not implement this entire framework, the pseudo-code listings are important to understand as they provide fairly precise definitions of how graph classes can be coded. Graph // abstract base class AdjacencyMatrixBase // base class for adj matrix reps AMUGraph // adj matrix representation, undirected graph AMDGraph // adj matrix representation, directed graph AdjacencyListBase // base class for adj list reps ALUGraph // adj list representation, undirected graph ALDGraph // adj list representation, directed graph It may seem odd to violate the pattern of not signaling the specific implementation of an ADT to the client programmers. However, it is appropriate in the case of graphs because space usage and many algorithms have widely disparate asymptotic time and space behavior that depends specifically on representation. The following pseudo-code provides detail to the roadmap. The methods are sufficient to support many graph models and algorithms: class Graph // abstract base class for all representations { typedef unsigned Vertex; public: virtual void SetVrtxSize (unsigned n) = 0; virtual void AddEdge (Vertex from, Vertex to) = 0; virtual bool HasEdge () const; virtual unsigned VrtxSize () const; virtual unsigned EdgeSize () const; virtual unsigned OutDegree (Vertex v) const = 0; virtual unsigned InDegree (Vertex v) const = 0; ... }; class AdjacencyMatrixBase : public Graph { ... AdjIterator Begin (Vertex x) const; AdjIterator End (Vertex x) const; ... fsu::Matrix am_; }; class AdjacencyListBase : public Graph { ... AdjIterator Begin (Vertex x) const; AdjIterator End (Vertex x) const; ... fsu::Vector < fsu::List < Vertex > > al_; }; The AdjIterator type needs to be defined for both matrix and list representations. AdjIterator must be at least a forward ConstIterator traversing the collection of vertices adjacent from v. For the adjacency list representation AdjIterator is best defined as a list ConstIterator. Begin(x) can be defined as follows: AdjIterator AdjacncyListBase::Begin (Vertex x) { return al_[x].Begin(); } (For the adjacency matrix representation, AdjIterator is defined in terms of a vector iterator on rows of the matrix, skipping over zero entries of the matrix and stopping at one entries which represent extant edges. This makes a traversal of the neighbors of v a Θ(|V|) process, very inefficient for a sparse graph. We will not delve further into the adjacency matrix AdjIterator.) The following implementations completely clarify how edges are represented in all four situations: AMDGraph::AddEdge(Vertex x, Vertex y) { am_(x,y) = 1; } AMUGraph::AddEdge(Vertex x, Vertex y) { am_(x,y) = 1; am_(y,x) = 1; } ALDGraph::AddEdge(Vertex x, Vertex y) { al_[x].Insert(y); } ALUGraph::AddEdge(Vertex x, Vertex y) { al_[x].Insert(y); al_[y].Insert(x); } Note that it is impossible to add a redundant edge using the adjacency matrix representation, but quite possible in the adjacency list representation. The AddEdge method could be upgraded to check for an existing edge before adding one, but that would require a (sequential) search of the list and make AddEdge runtime linear in the size of the neighbor list instead of the constant time as is. If the adjacency list is replaced with a unimodal set type, this problem would go away. Generally, we just leave it to client programs to correctly manage their graphs. A final recommendation on the graph framework is to use class templates. The advantage is that template code is not compiled unless it is actually used in code, so template-izing the classes results in leaner code for applications - compiling only what the client needs. The unsigned type used for vertex representation makes a convenient template parameter. 1.5 Actual CodeWhile the construction of a Graph framework as outlined is an interesting and useful software development project, we will not build the entire framework for LIB. We concentrate on adjacency list representations and therefore get by with two classes: a base class for undirected graphs and a derived class for directed graphs. template < typename N > class ALUGraph { public: typedef N Vertex; typedef typename fsu::List<Vertex> SetType; typedef typename SetType::ConstIterator AdjIterator; void SetVrtxSize (N n); size_t VrtxSize () const; void AddEdge (Vertex from, Vertex to); bool HasEdge (Vertex from, Vertex to) const; size_t EdgeSize () const; // Theta (|V| + |E|) size_t OutDegree (Vertex v) const; size_t InDegree (Vertex v) const; void Clear (); void Dump (std::ostream& os); // Theta (|V| + |E|) AdjIterator Begin (Vertex x) const; AdjIterator End (Vertex x) const; ALUGraph (); ALUGraph (N n); protected: fsu::Vector < SetType > al_; }; There is no new data in the derived class, and the base class iterator support methods work for the derived class, because the definition of iterator is unchanged. Only three methods need to be re-defined/over-ridden for the directed case: template < typename N > class ALDGraph : public ALUGraph <N> { public: typedef N Vertex; typedef typename ALUGraph<N>::SetType SetType; typedef typename ALUGraph<N>::AdjIterator AdjIterator;// void SetVrtxSize (N n); // size_t VrtxSize () const; void AddEdge (Vertex from, Vertex to); // bool HasEdge (Vertex from, Vertex to) const; size_t EdgeSize () const; // Theta (|V| + |E|) // size_t OutDegree (Vertex v) const; size_t InDegree (Vertex v) const; // Theta (|V| + |E|) // void Clear (); // void Dump (std::ostream& os); // Theta (|V| + |E|) // AdjIterator Begin (Vertex x) const; // AdjIterator End (Vertex x) const; ALDGraph (); ALDGraph ( N n ); // new method - creates d as the reverse directed graph of *this void Reverse(ALDGraph& d) const; }; Implementations are mostly very straightforward based on the discussions above. We present several here to illustrate the use of the adjacency list and AdjIterators. template < typename N > bool ALUGraph<N>::HasEdge (Vertex from, Vertex to) const { AdjIterator i = al_[from].Includes(to); if (i == End(from)) return 0; return 1; }
template < typename N > size_t ALUGraph<N>::EdgeSize () const // Theta (|V| + |E|) // Theta (|V| + |E|) { size_t esize = 0; for (Vertex v = 0; v < al_.Size(); ++v) esize += al_[v].Size(); return esize/2; }
template < typename N > void ALUGraph<N>::Dump (std::ostream& os) // Theta (|V| + |E|) { AdjIterator j; for (Vertex v = 0; v < VrtxSize(); ++v) { os << '[' << v << "]->"; j = this->Begin(v); if (j != this->End(v)) { os << *j; ++j; } for ( ; j != this->End(v); ++j) { os << ',' << *j; } os << '\n'; } }
template < typename N > size_t ALDGraph<N>::EdgeSize () const // Theta (|V| + |E|) { size_t esize = 0; for (Vertex v = 0; v < ALUGraph<N>::al_.Size(); ++v) esize += ALUGraph<N>::al_[v].Size(); return esize; }
template < typename N > size_t ALDGraph<N>::InDegree (Vertex v) const // Theta (|V| + |E|) { size_t indegree = 0; AdjIterator j; for (Vertex x = 0; x < ALUGraph<N>::VrtxSize(); ++x) { for (j = this->Begin(x); j != this->End(x); ++j) { if (v == *j) ++indegree; } } return indegree; }
1.6 Exercises
2 Breadth First SearchOne of the first things we want to do in a graph or digraph is find our way around. There are two widely used and famous processes to perform a search in a graph, both of which have been introduced and used in other contexts: depth-first and breadth-first search. In trees, for example, preorder and postorder traversals follow the depth-first search process, and levelorder traversal follows the breadth-first process. And solutions to maze problems typically use one or the other to construct a solution path from start to goal. Trees and mazes are representable as special kinds of graphs (or digraphs). Mazes, in particular, provide an excellent context to study these fundamental algorithms. We will assume throughout the remainder of this chapter that G=(V,E) is a graph, directed or undirected, with |V| vertices and |E| edges. We also assume that the graph is presented to the search algorithms using the adjacency list representation. 2.1 BFSearch(v)The Breadth-First Search [BFS] process begins at a vertex of G and explores the graph from that vertex. At any stage in the search, BFS considers all vertices adjacent from the current vertex before proceeding deeper into the graph. The process is a direct "upgrade" to the Levelorder Traversal of a tree, using a control queue conQ_. The name of the graph is g_: BFSearch( Vertex v ) { conQ_.Push(v); mark v visited while (!conQ_.Empty()) { // add all unvisited neighbors n of the front to the queue; n = *i Vertex front = conQ_.Front(); for ( AdjIterator i = g_.Begin(front); i != g_.End(front); ++i ) { if (*i is not visited) { conQ_.Push(*i); mark *i visited } } // remove front of queue conQ_.Pop(); } } This pseudocode correctly describes the BFS algorithm, but a couple of things need to be clarified:
Re 1: Maintain an array of vertex numbers "parent_" so
that parent_[v] is the vertex from which v was
discovered. Then a path from v to x is the reverse of sequence of
vertices Re 2: One possibility is to maintain a bool array "visited_". However we want to keep track of another property with 3 possible values: (a) undiscovered, (b) discovered and being processed (that is, still in the control queue), and (c) processing completed (that is, already removed from the control queue). Property (a) is equivalent to "unvisited", so we have access to that information. We follow the exposition in [Cormen et al (2009), Introduction to Algorithms (3rd ed.), MIT Press, Cambridge, MA] and use 3 colors: white for unvisited vertices, grey for vertices in the control queue, and black for vertices that have already cycled through the control queue. Colors are maintained in the array color_. The following provides the remaining details for BFS Search(v): Search( Vertex v ) { conQ_.Push(v); color_[v] = grey; while (!conQ_.Empty()) { Vertex front = conQ_.Front(); // Push all unvisited neighbors of the front for ( AdjIterator i = g_.Begin(front); i != g_.End(front); ++i ) { if (color_[*i] == white) // unvisited { parent_[*i] = front; color_[*i] = grey; conQ_.Push(*i); } } // Pop the queue conQ_.Pop(); color_[front] = black; } } The runtime of BFSearch(v) is straightforward to estimate. Note that a vertex only changes color from white to grey to black, so each vertex is processed at most one time in the outer loop (one push and one pop), for a total cost of 2×|V|. Note also that the for loop traverses the adjacency list of a vertex only one time, when that vertex is at the front of the queue, so that the cummulative cost of all the for loop executions is
(using Theorem 1 to get the 2×|E| bound). Thus the total cost of BFSearch(v) is
If every vertex is processed, the result is exactly Θ(|V| + |E|). These are as efficient as possible, because just touching all the vertices and edges has the same cost estimate. These observations are summarized in the theorems at the end of this section. As an example, performing BFSearch(5) on the graph G1 encounters the vertices as follows: adj list rep Graph G1 BFS::conQ ------------ -------- <-------- v[0]: 1 , 3 null ... v[1]: 0 0 --- 1 2 --- 6 --- 7 5 6 3 9 v[2]: 5 , 6 | | | 5 2 6 3 9 7 v[3]: 0 , 4 | | | 5 2 4 3 9 7 v[4]: 3 , 5 3 --- 4 --- 5 --- 8 --- 9 5 2 4 8 3 9 7 0 v[5]: 2 , 4 , 8 2 4 8 9 7 0 v[6]: 2 , 7 2 4 8 6 7 0 v[7]: 6 , 9 4 8 6 0 v[8]: 5 , 9 4 8 6 3 0 1 v[9]: 7 , 8 8 6 3 1 8 6 3 9 null ... Vertex discovery order: 5 2 4 8 6 3 9 7 0 1 grouped by distance: [ (5) (2 4 8) (6 3 9) (7 0) (1) ] Another set of information illustrated above is distance from the start vertex. We will show later in these notes that the paths calculated by BFSearch(v) are shortest possible paths, so that the length of the path from x back to v is by definition the distance from x to v. Finally, we note that the BFSearch algorithm works equally well for undirected and directed graphs. 2.2 BFSurveyWe have got to the point that our BFS algorithm is most efficiently discussed and used as a class with its own data. We keep track of both distance and time, along with color and parent info. template < class G > class BFSurvey { public: typedef G Graph; typedef typename Graph::Vertex Vertex; typedef typename Graph::AdjIterator AdjIterator; BFSurvey ( const Graph& g ); BFSurvey ( const Graph& g , Vertex start ); void Search ( ); void Search ( Vertex v ); void Reset ( ); void Reset ( Vertex start ); fsu::Vector < Vertex > distance_; // distance from search origin fsu::Vector < Vertex > dtime_; // discovery time fsu::Vector < Vertex > parent_; // for BFS tree fsu::Vector < char > color_; // using chars 'w'=white, 'g'=grey, 'b'=black private: const Graph& g_; Vertex start_; // default is vertex 0 size_t time_; // global sequencing clock size_t infinity_; // unreachable distance = 1+|E| size_t forever_; // unreachable time = |V| Vertex null_; // undefined vertex = |V| fsu::Deque < Vertex > conQ_; // control queue }; The template parameter is intended to be a graph type (ALUGraph or ALDgraph). The const reference g_ must be set by the class constructors. The algorithm thus attaches itself to the graph, remura-like. It does not make a copy of the graph, it operates directly on the graph through this reference. The notions of time and distance require some explanation. The distance distance_[x] represents the length of the "parent path" from a vertex x back to the beginning search vertex v. The longest conceivable path in a graph contains all of the edges, and therefore one more than the number of edges is an un-attainable distance in the graph. This allows us to define "infinity": infinity_ = 1 + g_.EdgeSize() // impossibly long distance. Time is kept by the "clock" class variable time_, initialized to zero by constructors and Reset methods. In order to ensure that each search event happens at its own unique time, we adopt the convention that time is always assigned using the postfix increment of the clock variable: dtime[x] = time_++; Because in BFS we assign a time only when a vertex is discovered (dtime_[x] is the discovery time of x) the sets V = { vertices } and AssignedTimes = { 0 1 2 ... last_time_assigned } are in 1-1 correspondence. Thus last_time_assigned is 1 less than the number of vertices, and the first un-assignable time is the number of vertices: forever_ = g_.VrtxSize() // impossibly long time. These values are instantiated by constructors and Reset methods. Caution: Time is assigned twice to each vertex in depth-first search, so the notion of forever is 2*g_.VrtxSize() in a DFS setting. 2.3 Implementation of BFSurveyOther than setting the graph reference, done only by constructors, constructors and Reset methods accomplish much the same thing:
template < class G > BFSurvey<G>::BFSurvey (const Graph& g) : g_(g), start_(0), time_(0), infinity_(1+g_.EdgeSize()), forever_((Vertex)g_.VrtxSize()), null_((Vertex)g_.VrtxSize()), distance_ (g_.VrtxSize(), infinity_), dtime_ (g_.VrtxSize(), forever_), parent_ (g_.VrtxSize(), null_), color_ (g_.VrtxSize(), 'w'), // 'w' = white conQ_() {} template < class G > BFSurvey<G>::BFSurvey (const Graph& g , size_t start) : g_(g), start_(start), time_(0), infinity_(1+g_.EdgeSize()), forever_(g_.VrtxSize()), null_((Vertex)g_.VrtxSize()), distance_ (g_.VrtxSize(), infinity_), dtime_ (g_.VrtxSize(), forever_), parent_ (g_.VrtxSize(), null_), color_ (g_.VrtxSize(), 'w'), // 'w' = white conQ_() {} template < class G > void BFSurvey<G>::Reset() { time_ = 0; conQ_.Clear(); if (color_.Size() != g_.VrtxSize()) // g_ has changed vertex size { infinity_ = 1+g_.EdgeSize(); // unreachable distance forever_ = g_.VrtxSize(); // unreachable time null_ = (Vertex)g_.VrtxSize(); // undefined parent distance_.SetSize (g_.VrtxSize(), infinity_); dtime_.SetSize (g_.VrtxSize(), forever_); parent_.SetSize (g_.VrtxSize(), null_); color_.SetSize (g_.VrtxSize(), 'w'); // 'w' = white } else { for (Vertex x = 0; x < g_.VrtxSize(); ++x) { distance_[x] = infinity_; // unreachable distance dtime_[x] = forever_; // unreachable time parent_[x] = null_; // undefined parent color_[x] = 'w'; // 'w' = white } } } template < class G > void BFSurvey<G>::Reset( Vertex start ) { start_ = start; Reset(); } Note that Reset allows the possibility that the graph has changed vertex size. But we don't change the reference to the graph itself. If we want to survey a different graph we use a different survey instance. The Search(v) method is the same algorithm discussed above, enhanced to update all of the class data: template < class G > void BFSurvey<G>::Search( Vertex v ) { distance_[v] = 0; dtime_[v] = time_++; conQ_.PushBack(v); color_[v] = 'g'; // 'g' = grey Vertex front; AdjIterator i; while (!conQ_.Empty()) { front = conQ_.Front(); // add all unvisited neighbors of front to queue for ( i = g_.Begin(front); i != g_.End(front); ++i ) { if ('w' == color_[*i]) // 'w' = white = unvisited { distance_[*i] = distance_[front] + 1; dtime_[*i] = time_++; parent_[*i] = front; color_[*i] = 'g'; // 'g' = grey conQ_.PushBack(*i); } } // remove front of queue conQ_.PopFront(); color_[front] = 'b'; // 'b' = black } } The Search() method calls Search(v) until all vertices have been visited: template < class G > void BFSurvey<G>::Search() { Reset(); for (Vertex v = start_; v < g_.VrtxSize(); ++v) { if (color_[v] == 'w') // 'w' = white = unvisited Search(v); } for (Vertex v = 0; v < start_; ++v) { if (color_[v] == 'w') // 'w' = white = unvisited Search(v); } } Search() is divided into two loops only to accomodate the starting vertex. These loops typically encounter few white vertices after the first call to Search(v). For example, if the graph is undirected and connected then every vertex is reachable from any start vertex, so only the first call to Search(v) is activated. Nevertheless, to be sure that all vertices are visited, we have to check each one for color. Note that Reset() is called at the beginning of Search() to initialize the survey data and the algorithm control data so that Search() starts out with a blank slate. On the other hand, it would not work for Search(v) to call Reset(), that would blank out any survey data from previous calls to Search(v). So: if a client wants to run Search(v) for one particular vertex, the client must first make the call to Reset explicitly. Technical Upgrades. In practice, these technical upgrades to the code should be made:
We left these out of the displayed code for readability. The library versions contain the upgrades, as well as an instrumentation option for examining specific runs of the algorithm. 2.4 ConclusionsTheorem 2s. The runtime of BFSurvey::Search(v) is O(|V| + |E|). Theorem 2f. The runtime of BFSurvey::Search() is Θ(|V| + |E|). (Here 's' is for "single search" and 'f' is for "full survey". Proofs of these results are interwoven into 2.1-2.3.) 2.5 Exercises
3 Depth-First SearchLike BFS, the Depth-First Search [DFS] process begins at a vertex of G and explores the graph from that vertex. In contrast to BFS, which considers all adjacent vertices before proceeding deeper into the graph, DFS follows as deep as possible into the graph before backtacking to an unexplored possibility. One way to understand the relationship between BFS and DFS begins with the following refactoring of the code for BFSearch(v): BFSearch(v) { conQ_.Push(v); // font or back - conQ_ is empty color_[v] = grey; parent_[v] = null_; while (!conQ.Empty()) { Vertex x = conQ.Front(); if (n = unvisited adjacent from x) { conQ.PushBack(n); // Push at back end color_[n] = grey; parent_[n] = x; } else { conQ.PopFront(); // Pop at opposite end - Queue behavior color_[x] = black; } } }
Taking this code for BFSearch(v) and making one change results in a DFSearch(v) algorithm: DFSearch(v) { conQ_.Push(v); // font or back - conQ_ is empty color_[v] = grey; parent_[v] = null_; while (!conQ.Empty()) { Vertex x = conQ.Front(); if (n = unvisited adjacent from x) { conQ.PushFront(n); // Push at front end color_[n] = grey; parent_[n] = x; } else { conQ.PopFront(); // Pop at same end - Stack behavior color_[x] = black; } } } It is surprising how similar these two code blocks are. The algorithms behave in very different ways. The comparison does serve to emphasize the essential difference: BFSearch(v) is queue controlled whereas DFSearch(v) is stack controlled. The optimized version of BFS discussed in Section 2 is much simpler in that we explicitly push all unvisited neighbors of the front of the queue immediately, which allows us to show that only one traversal of each neighbor list is required to implement "unvisited adjacent from front" and ultimately conclude that the runtime of BFSearch(v) is O(|V| + |E|). We will need to provide a way to find the first unvisited neighbor of the top of the stack in order to reach the same conclusion for DFSearch(v). 3.1 DFSearch(v)The pseudo-code listed above for DFSearch(v) converts directly into actual code. The main obstacle to an efficient implementation is finding an implementation of "next unvisited neighbor" of the top of the control stack whose aggregate cost is O(|V|). We accomplished this for BFS by refactoring BFSearch(v) to traverse the neighbor list of the front of the queue one time only. This solution does not carry over to DFS, because the top of the control stack is changed by a push operation, so we cannot predict which neighbor list is needed to find the next unvisited. Three possible solutions to this problem may be advanced:
Re 1: This is the approach we take. The iterators in the array advance in a somewhat ragged fashion, but they always advance, so that when the search process halts each of the adjacency lists has been traversed at most one time, at an aggregate cost bounded above by
(again using Theorem 1 for to get the 2×|E| bound). Just as in BFSearch, each vertex is processed through the stack at most one time (one Push and one Pop), so the total cost of DFSearch(v) is
The use and maintenance of the array of adj iterators adds explicitly to the space overhead and intricacy of the algorithm. Re 2: The problem with this implementation is that the size of the control stack is limited only by the number of edges. Really it's a kluge. Re 3: This is the approach taken in [Cormen et al (2009), Introduction to Algorithms (3rd ed.), MIT Press, Cambridge, MA] and [Weiss (2014), Data Strucrures and Algorithms in C++ (4th ed.), Pearson Education/Addison-Wesley, Upper Saddles River, NJ]. The explicit space overhead in the iterative approach is hidden in the recursive call activation records, making the algorithm simpler to code and explain. (But also easier to "gloss over" details of how the algorithm actually works.) There is some commentary, for example in the Wikipedia entry for DFS, that if the recursive implementation is the "real" DFSearch then the iterative implementation is not because it considers the neighbor vertices in a different (actually, reversed) order. This is a distinction without a difference, considering that (a) the order of vertices in an adjacency list is user-dependent and (b) in any case the order of consideration is easily reversed because AdjIterator is bidirectional. In fact, some have advocated making AdjIterators randomized, which would make distinctions of order in adjacency lists dissappear entirely. By way of example, here is the result of running DFSearch(5) on the graph G1: adj list rep Graph G1 DFS::conStack ------------ -------- --------> v[0]: 1 , 3 null ... v[1]: 0 0 --- 1 2 --- 6 --- 7 5 5 v[2]: 5 , 6 | | | 5 2 5 4 v[3]: 0 , 4 | | | 5 2 6 5 4 3 v[4]: 3 , 5 3 --- 4 --- 5 --- 8 --- 9 5 2 6 7 5 4 3 0 v[5]: 2 , 4 , 8 5 2 6 7 9 5 4 3 0 1 v[6]: 2 , 7 5 2 6 7 9 8 5 4 3 0 v[7]: 6 , 9 5 2 6 7 9 5 4 3 v[8]: 5 , 9 5 2 6 7 5 4 v[9]: 7 , 8 5 2 6 5 5 2 null ... Vertex discovery order: 5 2 6 7 9 8 4 3 0 1 Vertex finishing order: 8 9 7 6 2 1 0 3 4 5
3.2 DFSurveyThe depth-first survey class definition is completely analogous to that defining breadth-first survey. The notion of distance is not as relevant to DFS and is omitted. But there are two "time" arrays, keeping track of discovery time of a vertex v (when v is pushed onto the control stack) and finishing time (when v is popped off the control stack). The private method NextUnvisitedNeighbor(x) returns the next unvisited neighbor of x, facilitated by the array of AdjIterators named nun_: template < class G > class DFSurvey { public: typedef G Graph; typedef typename Graph::Vertex Vertex; typedef typename Graph::AdjIterator AdjIterator; DFSurvey ( const Graph& g ); DFSurvey ( const Graph& g , Vertex start ); void Search ( ); void Search ( Vertex v ); void Reset ( ); void Reset ( Vertex start ); fsu::Vector < Vertex > dtime_; // discovery time fsu::Vector < Vertex > ftime_; // finishing time fsu::Vector < Vertex > parent_; // for DFS tree fsu::Vector < char > color_; // various uses private: // data const Graph& g_; Vertex start_; // default is vertex 0 size_t time_; // global sequencing clock size_t forever_; // unreachable time Vertex null_; // undefined vertex fsu::Vector < AdjIterator > nun_; // platoon of iterators supporting NextUnvisitedNeighbor fsu::Deque < Vertex > conQ_; // control stack // method AdjIterator NextUnvisitedNeighbor(Vertex x); // returns iterator to next unvisited neighbor of x };
The notion of time is analagous to that of BFSurvey. However note that each vertex x gets two times assigned - discovery time dtime_[x], when x enters the control stack, and finishing time ftime_[x], when x departs the control stack. Thus the time instances used in the algorithm are those in the range [0..2×|V|). The first unreachable time is forever_ = 2*g_.VrtxSize(). 3.3 Implementing DFSurveyAll of the method implementations are completely analogous to those of BFSurvey with three exceptions: (1) Search(v) follows the DFS algorithm described above; (2) constructors and Reset methods must add initialization of the vector nun_ to their task lists; and (3) there is a new method NextUnvisitedNeighbor(x) which returns an iterator pointing to the next unvisited neighbor of x. The portion of the code added by these exceptions is displayed in blue. template < class G > DFSurvey<G>::DFSurvey (const Graph& g) : g_(g), start_(0), time_(0), forever_(2*g_.VrtxSize()), null_((Vertex)g_.VrtxSize()), dtime_ (g_.VrtxSize(), forever_), ftime_ (g_.VrtxSize(), forever_), parent_ (g_.VrtxSize(), null_), color_ (g_.VrtxSize(), 'w'), // 'w' = white nun_ (g_.VrtxSize()), conQ_() { for (Vertex x = 0; x < g_.VrtxSize(); ++x) nun_[x] = g_.Begin(x); }
template < class G > DFSurvey<G>::DFSurvey (const Graph& g , Vertex start ) : g_(g), start_(start), time_(0), forever_(2*g_.VrtxSize()), null_((Vertex)g_.VrtxSize()), dtime_ (g_.VrtxSize(), forever_), ftime_ (g_.VrtxSize(), forever_), parent_ (g_.VrtxSize(), null_), color_ (g_.VrtxSize(), 'w'), // 'w' = white nun_ (g_.VrtxSize()), conQ_() { for (Vertex x = 0; x < g_.VrtxSize(); ++x) nun_[x] = g_.Begin(x); }
template < class G > void DFSurvey<G>::Reset() { time_ = 0; conQ_.Clear(); if (color_.Size() != g_.VrtxSize()) // g_ has changed vertex size { forever_ = 2*g_.VrtxSize(); // last time stamp is 2|V| -1 null_ = (Vertex)g_.VrtxSize(); // |V| is not valid vertex number dtime_.SetSize (g_.VrtxSize(), forever_); ftime_.SetSize (g_.VrtxSize(), forever_); parent_.SetSize (g_.VrtxSize(), null_); color_.SetSize (g_.VrtxSize(), 'w'); // 'w' = white nun_.SetSize (g_.VrtxSize()); for (Vertex x = 0; x < g_.VrtxSize(); ++x) nun_[x] = g_.Begin(x); } else { for (Vertex x = 0; x < g_.VrtxSize(); ++x) { dtime_[x] = forever_; ftime_[x] = forever_; parent_[x] = null_; color_[x] = 'w'; // 'w' = white nun_[x] = g_.Begin(x); } } }
template < class G > void DFSurvey<G>::Reset( Vertex start ) { start_ = start; Reset(); }
template < class G > void DFSurvey<G>::Search( Vertex v ) { dtime_[v] = time_++; conQ_.PushBack(v); color_[v] = 'g'; // 'g' = grey Vertex top; AdjIterator i; while (!conQ_.Empty()) { top = conQ_.Back(); i = NextUnvisitedNeighbor(top); // an iterator! if (i != g_.End(top)) { dtime_[*i] = time_++; conQ_.PushBack(*i); parent_[*i] = top; color_[*i] = 'g'; // 'g' = grey } else { conQ_.PopBack(); color_[top] = 'b'; // 'b' = black ftime_[top] = time_++; } } }
template < class G > void DFSurvey<G>::Search() { Reset(); for (Vertex v = start_; v < g_.VrtxSize(); ++v) { if (color_[v] == 'w') // white = unvisited Search(v); } for (Vertex v = 0; v < start_; ++v) { if (color_[v] == 'w') // white = unvisited Search(v); } }
template < class G > typename DFSurvey<G>::AdjIterator DFSurvey<G>::NextUnvisitedNeighbor (Vertex x) { // Note: nun_[x] is already initialized and part way through a traversal! // This loop just advances to the next unvisited (white) neighbor of x // Total cost for entire survey is a single traversal of each adjacency list = Theta(|E|) while (nun_[x] != g_.End(x) && 'w' != color_[*nun_[x]]) ++nun_[x]; return nun_[x]; }
3.4 ConclusionsTheorem 3s. The runtime of DFSurvey::Search(v), as implemented with an array of AdjIterators, is O(|V| + |E|). Theorem 3f. The runtime of DFSurvey::Search(), as implemented with an array of AdjIterators, is Θ(|V| + |E|). 3.5 Exercises
|