Educational Objectives: After completing this assignment, the student should be able to accomplish the following:
Operational Objectives: Design and implement the following classes:
You may have teams of 2 or 3 people. The team should compose a brief summary of work that explains the responsibilities and work products each member of the team accomplished. Also each team member should submit the project individually. Please make certain that the submissions for each member of a team are identical.
Deliverables: Files:
readme.txt # required for all bfsurvey.h # required for all dfsurvey.h # required for 3-teams only symgraph.h # required for all moviematch.h # required for all makefile # builds all executables in project, including tests
The official development | testing | assessment environment is gnu g++ on the linprog machines.
Each member of a team submits all team deliverables
Deliverables submitted should be identical across all team members.
The team makup is listed in the file header documentation of each submitted file (see C++ Style link for standards)
File readme.txt explains how the software was developed, what responsibilities each team member had, how it was tested, and how it is expected to be operated.
Copy all of the test harnesses and graph files from LIB/proj4:
fgraph.cpp # general test for graph classes ftopsort.cpp # another test for directed graphs fbfsurvey.cpp # used for fbfsurvey_ug.cpp and fbfsurvey_dg.cpp fdfsurvey.cpp # used for fdfsurvey_ug.cpp and fdfsurvey_dg.cpp KevinBacon.cpp # client program for MovieMatch
Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.
Class ALUGraph implements the adjacency list representation of a graph whose vertices are assumed to be unsigned integers 0,1,...,n-1. The interface should conform to:
namespace fsu { template < typename N > class ALUGraph { public: typedef N Vertex; typedef xxxxx AdjIterator; void SetVrtxSize (N n); void AddEdge (Vertex from, Vertex to); size_t VrtxSize () const; size_t EdgeSize () const; size_t OutDegree (Vertex x) const; size_t InDegree (Vertex x) const; AdjIterator Begin (Vertex x) const; AdjIterator End (Vertex x) const; ALUGraph ( ); ALUGraph ( N n ); ... }; } // namespace fsu
where xxxxx is a type that you define. This is an iterator for the adjacency list, which could be fsu::List<Vertex>::ConstIterator, std::list<Vertex>::const_iterator, or some other type. The directed graph API is exactly the same (but for the name of the class):
namespace fsu { template < typename N > class ALDGraph { public: typedef N Vertex; typedef xxxxx AdjIterator; void SetVrtxSize (N n); void AddEdge (Vertex from, Vertex to); size_t VrtxSize () const; size_t EdgeSize () const; size_t OutDegree (Vertex x) const; size_t InDegree (Vertex x) const; AdjIterator Begin (Vertex x) const; AdjIterator End (Vertex x) const; ALUGraph ( ); ALUGraph ( N n ); ... }; } // namespace fsu
Much of the implementation code for the undirected and directed cases is identical, so it can be profitable to derive one of these from the other. In the derived class, only AddEdge, EdgeSize, and InDegree require re-definition.
Begin(x) returns an AdjIterator which is a forward ConstIterator that iterates through the adjacency list of the vertex v. End(x) returns the end iterator of the adjacency list. So, the loop
for (typename GraphType::AdjIterator i = g.Begin(x); i != g.End(x); ++i) {/* do something at the vertex *i */}
encounters all of the vertices adjacent from v in the (directed or undirected) graph g.
The template argument is some unsigned integer type. We are using templates mainly as a convenience so that member functions will not be compiled (or even require implementation) if they are not called by client code.
Test graph classes thoroughly using fgraph.cpp.
Algorithms should operate on ALGraph objects via the interface defined above, so that another team's version of ALGraph can be substituted without modification.
Algorithms should be class templates (in line with the graph class template). See discussion of algorithm classes in the Graphs 1 Lecture Notes.
Test algorithms (surveys) thoroughly using the supplied survey tests.
Class SymbolGraph implements a graph whose vertices are symbols (typically strings). The API is largely the same as that of the abstract graph classes discussed above, with the additional ability to adjust the vertex size "on the fly" using the Push() operation.
namespace fsu { template < typename S , typename N > class SymbolGraph { public: typedef S Vertex; typedef xxxxx AdjIterator; void SetVrtxSize (N n); void AddEdge (Vertex from, Vertex to); size_t VrtxSize () const; size_t EdgeSize () const; size_t OutDegree (Vertex x) const; size_t InDegree (Vertex x) const; AdjIterator Begin (Vertex x) const; AdjIterator End (Vertex x) const; void Push (const S& s); // add s to the vertex set // access to underlying data const ALUGraph<N>& GetAbstractGraph() const; // reference to g_ const HashTable<S,N,H>& GetSymbolMap() const; // reference to s2n_ const Vector<S>& GetVertexMap() const; // reference to n2s_ SymbolGraph ( ); SymbolGraph ( N n ); ... private: ALUGraph<N> g_; HashTable<S,N,H> s2n_; Vector<S> n2s_; ... }; } // namespace fsu
where xxxxx is the adjacency iterator type. There is a directed version SymbolDirectedGraph<S,N> whose implementation is almost identical to the undirected case, except using ALDGraph<N> as the abstract graph underpinning.
The template arguments are S = SymbolType and N = IntegerType. S is the type for the names of vertices, and is typically some form of string. N is the parameter to instantiate the underpinning abstract graph.
s2n_ is an associative array, or mapping, from symbols to vertices in the abstract graph g_. n2s_ is the inverse mapping from vertices in g_ to symbols. The symbol graph uses the two mappings to translate symbols to abstract vertices and calls operations in the abstract graph.
MovieMatch should provide services required by KevinBacon.cpp. This will require the following (partial) class definition:
class MovieMatch { public: MovieMatch (const char* baseActor) : baseActor_(0) { size_t length = strlen(baseActor); baseActor_ = new char [length + 1]; baseActor_[length] = '\0'; strcpy (baseActor_,baseActor); } void Load (const char* filename); // loads a moview/actor table unsigned long MovieDistance(const char* actor); // returns the number of movies required to get from actor to baseActor_ ... private: char* baseActor_; SymbolGraph < fsu::String , size_t > sg_; ... };
(The names can be your choice, except for those required by the distributed client program.)
If you prefer you may build the symbol graph directly in MovieMatch.
The underlying graph should be built from the "database" provided in the text file movies.txt. Each line of this file represents a movie and the actors in the movie. Forward slash '/' is used to delimit the strings representing movie titles and actor names in each line.
It will be helpful to use either the cstring library or std::string to read entire lines and break them up into strings using the '/' delimiter, so that spaces are captured. We will distribute a client program for MovieMatch that illustrates this approach by allowing actor names (with blanks) to be entered through the keyboard.
The Kevin Bacon game is this: given an actor by name, what is his/her Kevin Bacon number?
To solve this we first need a clear definition of the Kevin Bacon number for an actor, or more generally, the movie distance between two actors. The definition is much like the path distance between two vertices in a graph, except using movie chains instead of edges.
A movie chain from actor x to actor y is a sequence m1 m2 ... mk such that
- mj and mj+1 have an actor in common for 0 < j < k
- x is in movie m1
- y is in movie mk
The movie distance md(x,y) is defined to be the number of movies in a shortest movie chain from x to y. If there is no movie chain from x to y, we define md(x,y) = infinity.
The Kevin Bacon number of an actor x is the movie distance from x to Kevin Bacon.
Some consequences are:
To solve the Kevin Bacon game (or any other similar game based on another actor) we use graphs. Specifically, create a graph in which both actors and movies are vertices, and insert an edge whenever an actor is in a movie. Thus each edge has an actor for one vertex and a movie for the other.
A graph is said to be bipartite if the vertices can be colored with two colors, say Red and Blue, such that each edge has different colored vertices, that is, each edge goes between a blue vertex and a red vertex. Clearly the movie-actor graph is bipartite, with actors colored blue and movies colored red.
The following result is proved in discrete math courses and most books on graph theory:
Theorem. In a bipartite graph, a path whose ends have the same color has an even number of edges.
As a consequence, any path from one actor to another in the movie-actor graph has an even number of edges. If P is such a path, with length n, then n is even and n/2 is the number of movies passed through by P. If P is a shortest path from actor x to actor y, then n/2 is the movie distance from x to y.
Thus to solve the Kevin Bacon game, we perform a Breadth-First survey from Kevin Bacon. The Breadth First Tree rooted at Kevin Bacon consists of shortest paths from Kevin Bacon to all other actors who have a finite Kevin Bacon number. Dividing the length of such a path by 2 yields the Kevin Bacon number for the actor at the other end of the path.
In practical terms, we start at an actor x and follow the parent pointers of the BF tree back to Kevin Bacon, counting the steps. Then divide this count by 2 to get the number.
The abstract graph classes are provided, so interpret the admonitions to "thoroughly test" them as "become thourougly understand how these are designed and implemented".
Even though our typical use of the graph classes will have the template argument N = size_t, it will be very useful in your implementation code to distinguish between type Vertex and type size_t and carefully cast between the two when the two types have different connotations. For example, if Vertex x and size_t i, then Begin((Vertex)i) and parent[(size_t)x].
Several graph files are distributed in LIB/proj4. Some of these are named graph.v.e and some are named are named dag.v.e, where v is the number of vertices and e the number of edges of the graph represented by the file. DO NOT rely on these suffixes in your programs, they are for human convenience only (and in some instances may not even be accurate). Those named dag are purported to be acyclic when interpreted as directed graphs, but will have cycles when interpreted as undirected graphs.
A thorough understandiing of the material on the Graphs 1 Lecture Notes will be helpful.
The following is output from a test of DFSSurvey run on G1 (undirected case):
linprog2> fdfsug.x graph1.10.10 Begin DFSurvey functionality test graph type: undirected adjacency list Load complete Input file: graph1.10.10 VrtxSize = 10 EdgeSize = 10 df survey data ============== vertex dtime ftime parent color ------ ----- ----- ------ ----- 0 0 19 NULL b 1 1 2 0 b 2 6 15 5 b 3 3 18 0 b 4 4 17 3 b 5 5 16 4 b 6 7 14 2 b 7 8 13 6 b 8 10 11 9 b 9 9 12 7 b Vertex discovery order: 0 1 3 4 5 2 6 7 9 8 Vertex finishing order: 1 8 9 7 6 2 5 4 3 0 End DFSurvey functionality test linprog2>
Note the table of survey data and the output of the vertices in preorder and postorder.
The following is output from a test of BFSSurvey run on G1 (directed case):
linprog2> fbfsdg.x graph1.10.10 Begin BFSurvey functionality test graph type: directed adjacency list Load complete Input file: graph1.10.10 VrtxSize = 10 EdgeSize = 10 df survey data ============== vertex distance dtime parent color ------ -------- ----- ------ ----- 0 0 0 NULL b 1 1 1 0 b 2 0 7 NULL b 3 1 2 0 b 4 2 3 3 b 5 3 4 4 b 6 1 8 2 b 7 2 9 6 b 8 4 5 5 b 9 5 6 8 b Vertex discovery order: 0 1 3 4 5 8 9 2 6 7 grouped by distance: [ ( 0 ) ( 1 3 ) ( 4 ) ( 5 ) ( 8 ) ( 9 ) ] [ ( 2 ) ( 6 ) ( 7 ) ] End BFSurvey functionality test linprog2>
Again the table shows the survey data. The vertex discovery order, grouped by distance from the search vertex, is also shown. (BFS discovers and finishes vertices in the same order.) The discovery order grouped by distance uses [ ] to delimit trees in the forest and ( ) to delimit vertices the same distance away from the root of the tree.
The discovery and finishing order are computed post-survey from the timestamps. (Discovery time was added to the usual BFS to facilitate this.) The discovery order "grouped by distance" output in fbfsurvey uses both distance and time. One could also output a Lisp-syntax record of the search forest for either survey.
Sample executables are available in LIB/area51. These show some elaborations such as digraphs, topological sort for digraphs, and output from the surveys that isn't direct. I added discovery time to BFSurvey, which is handy information to have, as illustrated by the post-survey computation of discovery order. Decode the names as follows:
fgraph.x # general test of Graph classes - can supply detailed log fdfsud.x # functionality test of DFSurvey - undirected graphs fdfsdg.x # functionality test of DFSurvey - directed graphs fbfsud.x # functionality test of BFSurvey - undirected graphs fbfsdg.x # functionality test of BFSurvey - directed graphs ftopsort.x # functionality test of TopSort - directed graphs only
It will be helpful to have a function (or method in MovieMatch) that gets an entire line from movies.txt and returns a vector v of the individual names delimited by '/' in the line. The first name v[0] would be a movie title and the remaining v[1] .. v[v.Size() - 1] would be actors in that movie.
You will need to decide what a "string" is. We recommend using string objects, either fsu::String or std::string.