Project 5: Kevin Bacon

Final Algorithms Project

Educational Objectives: After completing this assignment, the student should be able to accomplish the following:

Describe and implement SymbolGraph
Define bipartite graphs
Explain the basic conclusions about path lengths in bipartite graphs
Describe the back-end design for the Movie Match game solver

Operational Objectives: Design and implement the following classes:

Symbol Graph
Movie Match Game

You may have teams of 2 or 3 people. The team should compose a brief summary of work that explains the responsibilities and work products each member of the team accomplished. Also each team member should submit the project individually. Please make certain that the submissions for each member of a team are identical.

Deliverables: Files:

readme.txt     
symgraph.h     
moviematch.h   
kevinbacon.cpp # your client, may be same as kb.cpp or an elaboration
makefile       # builds all executables in project, including tests

Procedural Requirements

The official development | testing | assessment environment is g++47 -std=c++11 -Wall -Wextra on the linprog machines. Code should compile without error or warning.
Each member of a team submits all team deliverables
Deliverables submitted should be identical across all team members.
The team makup is listed in the file header documentation of each submitted file (see C++ Style link for standards)
File readme.txt explains how the software was developed, what responsibilities each team member had, how it was tested, and how it is expected to be operated.

Copy all files from LIB/proj5, including:

kb.cpp # sample client program for MovieMatch
movies.txt
movies_abbreviated.txt

Copy the file LIB/proj5/proj5submit.sh into your project directory, change its permissions to executable, and submit the project by executing the script.

Warning: Submit scripts do not work on the program and linprog servers. Use shell.cs.fsu.edu to submit projects. If you do not receive the second confirmation with the contents of your project, there has been a malfunction.

Code Requirements and Specifications - SymbolGraph

Class SymbolGraph implements a graph whose vertices are symbols (typically strings). The API is largely the same as that of the abstract graph classes discussed Project 4, with the additional ability to adjust the vertex size "on the fly" using the Push() operation.

namespace fsu
{
  template < typename S , typename N >
  class SymbolGraph
  {
  public:
    typedef S      Vertex;
    typedef xxxxx  AdjIterator;

    void   SetVrtxSize  (N n);
    void   AddEdge      (Vertex from, Vertex to);
    size_t VrtxSize     () const;
    size_t EdgeSize     () const;
    size_t OutDegree    (Vertex x) const;
    size_t InDegree     (Vertex x) const;
    AdjIterator Begin   (Vertex x) const;
    AdjIterator End     (Vertex x) const;

    void   Push         (const S& s); // add s to the vertex set

    // access to underlying data
    const ALUGraph<N>&      GetAbstractGraph() const; // reference to g_
    const HashTable<S,N,H>& GetSymbolMap() const; // reference to s2n_
    const Vector<S>&        GetVertexMap() const; // reference to n2s_

    SymbolGraph ( );
    SymbolGraph ( N n );
    ...
  private:
    ALUGraph<N>      g_;
    HashTable<S,N,H> s2n_;
    Vector<S>        n2s_;
    ...
  };
} // namespace fsu

where xxxxx is the adjacency iterator type. There is a directed version SymbolDirectedGraph<S,N> whose implementation is almost identical to the undirected case, except using ALDGraph<N> as the underpinning abstract graph type.

The template arguments are S = SymbolType and N = IntegerType. S is the type for the names of vertices, and is typically some form of string. N is the parameter to instantiate the underpinning abstract graph.
s2n_ is an associative array, or mapping, from symbols to vertices in the abstract graph g_. n2s_ is the inverse mapping from vertices in g_ to symbols. The symbol graph uses the two mappings to translate symbols to abstract vertices and calls operations in the abstract graph.

Code Requirements and Specifications - MovieMatch

MovieMatch should, at a minimum, provide services required by kb.cpp. This will require the following (partial) class definition:

class MovieMatch
{
public:

  MovieMatch (const char* baseActor) : baseActor_(0)
  {
    size_t length = strlen(baseActor);
    baseActor_ = new char [length + 1];
    baseActor_[length] = '\0';
    strcpy (baseActor_,baseActor);
  }

  void Load (const char* filename);
  // loads a movie/actor file

  unsigned long MovieDistance(const char* actor);
  // returns the number of movies required to get from actor to baseActor_

  ...

private:
  char* baseActor_;
  SymbolGraph < fsu::String , size_t > sg_;
  ...
};

(The names can be your choice, except for those required by the distributed client program.)

If you prefer you may build the symbol graph directly in MovieMatch.

The underlying graph should be built from the "database" provided in the text file movies.txt. Each line of this file represents a movie and the actors in the movie. Forward slash '/' is used to delimit the strings representing movie titles and actor names in each line.
It will be helpful to use either the cstring library or std::string to read entire lines and break them up into strings using the '/' delimiter, so that spaces are captured. We will distribute a client program for MovieMatch that illustrates this approach by allowing actor names (with blanks) to be entered through the keyboard.

Movie Distance and Kevin Bacon

The Kevin Bacon game is this: given an actor by name, what is his/her Kevin Bacon number?

To solve this we first need a clear definition of the Kevin Bacon number for an actor, or more generally, the movie distance between two actors. The definition is much like the path distance between two vertices in a graph, except using movie chains instead of edges.

A movie chain from actor x to actor y is a sequence m₁ m₂ ... m_k such that

m_j and m_j+1 have an actor in common for 0 < j < k

x is in movie m₁

y is in movie m_k

The movie distance md(x,y) is defined to be the number of movies in a shortest movie chain from x to y. If there is no movie chain from x to y, we define md(x,y) = infinity.

The Kevin Bacon number of an actor x is the movie distance from x to Kevin Bacon.

Some consequences are:

Kevin Bacon has Kevin Bacon number 0.
All other actors have Kevin Bacon number at least 1.
if x != y and x and y are in the same movie, then md(x,y) = 1
md(x,z) <= md(x,y) + md(y,z)

The actor-movie graph

To solve the Kevin Bacon game (or any other similar game based on another actor) we use graphs. Specifically, create a graph in which both actors and movies are vertices, and insert an edge whenever an actor is in a movie. Thus each edge has an actor for one vertex and a movie for the other.

A graph is said to be bipartite if the vertices can be colored with two colors, say Red and Blue, such that each edge has different colored vertices, that is, each edge goes between a blue vertex and a red vertex. Clearly the movie-actor graph is bipartite, with actors colored blue and movies colored red.

The following result is proved in discrete math courses and most books on graph theory:

Theorem. In a bipartite graph, a path whose ends have the same color has an even number of edges.

As a consequence, any path from one actor to another in the movie-actor graph has an even number of edges. If P is such a path, with length n, then n is even and n/2 is the number of movies passed through by P. If P is a shortest path from actor x to actor y, then n/2 is the movie distance from x to y.

Thus to solve the Kevin Bacon game, we perform a Breadth-First survey from Kevin Bacon. The Breadth First Tree rooted at Kevin Bacon consists of shortest paths from Kevin Bacon to all other actors who have a finite Kevin Bacon number. Dividing the length of such a path by 2 yields the Kevin Bacon number for the actor at the other end of the path.

In practical terms, we start at an actor x and follow the parent pointers of the BF tree back to Kevin Bacon, counting the steps. Then divide this count by 2 to get the number.

Hints

It will be helpful to have a function (or method in MovieMatch) that gets an entire line from movies.txt and returns a vector v of the individual names delimited by '/' in the line. The first name v[0] would be a movie title and the remaining v[1] .. v[v.Size() - 1] would be actors in that movie.
You will need to decide what a "string" is. We recommend using string objects, either fsu::String or std::string.
The "SymbolGraph::Push" method adds a single new vertex, which (in the most straightforward implementation) would entail a new vector allocation at each call. That would be very inefficient and would bog the program down with hours of read time trying to build the movie-actor graph.

One solution to this would be to read the data twice ... once to count the number of vertices and again to build the graph (after the vertex size is set). The first step could be (1) read and push the name of each vertex onto the back of the vertex2name_ vector. Then (2) set up the associative array name2vertex_ and set the vertex size of the graph. Finally (3) read again to create all of the edges in the graph.
Let's just go with this approach. Later, we could optimize the two reads into one, but that would require some technical additions to the support code, not that hard to do but a distraction while working on the main application.