Assignment 5
Due: 28 Nov 2012
Educational objectives:
- Primary objectives: Implement an efficient data structure for storing words and searching to see if a word is in a file.
- Secondary objectives: Empirically compare the performance of different data structures.
Statement of work: Implement a good hash function for strings, where the strings are words, using the STL
tr1/unordered_set
container. Also implement another efficient data structure for storing these words. Compare the performances of of (i) STLset
, (ii) STLtr1/unordered_set
with the default hash function, (iii) STLtr1/unordered_set
with your hash function, and (iv) your data structure. You will be graded on the performance and correctness of your code. So, please use good compiler optimization flags in your makefile. You may also use OpenMP to improve the performance of your code.Deliverables:
- Turn in a
makefile
and all header (*.h) and cpp (*.cpp) files that are needed to build your software, as described in www.cs.fsu.edu/~asriniva/courses/DS12/HWinstructions.html. Turn in your development log too, which should be a plain ASCII text file calledLOG.txt
in your project directory.Requirements:
- Create a subdirectory called
proj5
.- You will need to have a
makefile
in this directory. In addition, all the header and cpp files needed to build your software must be present here, as well as theLOG.txt
file.- You should create the following additional files.
- MyHash.h: This should implement a hash function object that maps an STL string to an integer type.
- MyDS.h/MyDS.cpp: These files should provide the interface and implementation for your data structure. You can feel free to implement any data structure that you wish to, including designing your own data structure or combining multiple data structures. This data structure should store STL strings. It should be in a class called
MyDS
and implement at least the following member functions: (i) default constructor, (ii)void push(const string &)
, (iii)bool search(const string &)
, and (iv) destructor, which perform the operations expected from their names. You should not use any STL container other than strings to implement your data structure.- compare.cpp: This program will be compiled to create an executable called
compare-containers
, and the executable will be run as follows.For each of the files, your code should create four containers -- one of each kind mentioned above -- and store each word in the file in each of these containers. Your code will read words from
./compare-containers File-List
, whereFile-list
is a list of file names. Each file contains words separated by whitespaces.cin
, one word per line, until it reads the wordx
, and store these words in an STLvector
. For each word in the vector, your code should output all the files in which the word is present. You should output this result for each container, so that we can check whether each container works correctly. (If your code is correct, then the result for each container will be the same.) The output format will be:word Container File-names
where Container =set/hash/myhash/myds
and File-names is a list of files in which that word is present, each name separated by a blank. For example:
grade set file1.txt file3.txt
grade hash file1.txt file3.txt
grade myhash file1.txt file3.txt
grade myds file1.txt file3.txt
If the word is not present in any file, then the output will look like:
graed set
graed hash
graed myhash
graed myds
Your code should not perform the above search and output for the last word,
x
. Instead, it should output the time taken for initially storing the words of all the files, the minimum search time for a word, the maximum search time for a word, and the average search time for a word, for each container, and then exit. For example:
set: store 10.1 s, search: min 0.01 s, max 1.0 s, mean 0.05 s
hash: store 2.2 s, search: min 0.02 s, max 1.5 s, mean 0.04 s
myhash: store 9.1 s, search: min 0.01 s, max 0.09 s, mean 0.03 s
myds: store 1.1 s, search: min 0.001 s, max 0.02 s, mean 0.03 s
- result.txt: This is an ASCII text file. It should first describe your data structure (MyDS). It should then discuss the relative performances of the four data structures.
Note:
- We will test your
MyDS
class on a piece of code that we will write. So it is important for this class to be exactly as specified.Bonus points (5):
You may get up to 50 additional points if your code is correct and the fastest in class. You may get up to 25 bonus points if MyDS or an unordered_set with your hash function works correctly and is faster that the two STL containers. (The speed of the STL containers will be determined by a piece of single-threaded code that we write, in determining bonus points).
Copyright: Ashok Srinivasan, Florida State University.
Last modified: 8 Nov 2011