Assignment 5
Due: 25 Nov 2014
Educational objectives:
- Primary objectives: Implementing efficient data structures for inserting words into a dictionary and searching to see if a word is in a dictionary.
- Secondary objectives: Empirically compare the performance of different data structures.
Statement of work: Implement a hash function for strings that performs well with the dictionary application using the STL tr1/unordered_set container, implement an efficient data structure for the dictionary application, and compare the performance of the STL set and STL tr1/unordered_set with the default hash function and your hash function against your data structure. You will be graded on the performance and correctness of your code. So, please use good compiler optimization flags in your makefile.
Deliverables:
- Turn in a
makefile
and all header (*.h) and cpp (*.cpp) files that are needed to build your software, as described in www.cs.fsu.edu/~asriniva/courses/DS14/HWinstructions.html. Turn in your development log too, which should be a plain ASCII text file calledLOG.txt
in your project directory.Requirements:
- Create a subdirectory called
proj5
.- You will need to have a
makefile
in this directory. In addition, all the header and cpp files needed to build your software must be present here, as well as theLOG.txt
file.- You should create the following additional files.
- cputime.h: Copy this file from the
/home/courses/cop4530/solutions/proj5
directory. You should use this to time your code. Note that this file differs from the assignment 4 version in its semantics. Theproj5
directory has an example of its use.- MyHash.h: This should implement a hash function object named
MyHash
that maps an STL string to an integer type.- MyDS.h/MyDS.cpp: These files should provide the interface and implementation for your data structure. You can feel free to implement any data structure that you wish to, including designing your own data structure or combining multiple data structures. This data structure should store STL strings. It should be in a class called
MyDS
and implement at least the following member functions: (i) default constructor, (ii)void push(const string &)
, (iii)bool search(const string &)
, and (iv) destructor, which perform the operations expected from their names. You may use any STL container or algorithm that you wish to.- compare.cpp: This program will be compiled to create an executable called
compare-containers
, and the executable will be run as follows.This code should store all lower case words in the dictionary in four different containers: (i) STL set, (ii) STL
./compare-containers DictionaryName Filename
, whereDictionaryName
andFilename
are the names of files containing words separated by whitespaces. The former will be a dictionary containing distinct words; you can treat/usr/share/dict/words
on linprog as a typical example. Each word in the text fileFilename
contains a string of lower case letters.unordered_set
with the default hash function, (iii) STLunordered_set
with your hash function fromMyHash.h
, and (iv) MyDS. It will then check if each word inFilename
is present in the standard dictionary using each of the four containers. For each word, it will outputAnswer Container
where Answer = Y if the word is present and N if it is not, and Container = set/hash/myhash/myds. For example:
Y set
Y hash
Y myhash
Y myds
Y set
Y hash
Y myhash
Y myds
N set
N hash
N myhash
N myds
After handling all the words in the input file, your code should output the time taken for storing the entire dictionary, the minimum search time for a word, the maximum search time for a word, and the average search time, for each container. For example:
set: store dictionary 10.1 s, search: min 0.01 s, max 1.0 s, mean 0.05 s
hash: store dictionary 2.2 s, search: min 0.02 s, max 1.5 s, mean 0.04 s
myhash: store dictionary 9.1 s, search: min 0.01 s, max 0.09 s, mean 0.03 s
myds: store dictionary 1.1 s, search: min 0.001 s, max 0.02 s, mean 0.03 s
- result.txt: This is an ASCII text file. It should first describe your data structure (MyDS) and hash function. It should then discuss the relative performances of the four data structures.
Note:
- We may test your
MyDS
class and hash function object on a piece of code that we will write. So, it is important for they be exactly as specified.- Your code should be correct for any dictionary and text file that meets the above specifications. However, for the purpose of designing efficient data structures, you may assume that the dictionary contains words that are in sorted order and that the text file contains typical English words. This means that if the above assumptions are satisfied, then your code will run fast. If the above assumptions are not satisfied, then your code should still be correct, even though it may not be efficient.
- Providing fake times will be considered a serious ethical violation.
- You may lose points if your code is slow.
Bonus points (5):
You may get up to 50 additional points if your code is correct and the fastest in class. You may get up to 25 bonus points if MyDS or your hash function works correctly and is faster that the two STL containers (the speed of the STL containers will be determined by a piece of code that we write, in determining the bonus points).
Copyright: Ashok Srinivasan, Florida State University.
Last modified: 6 Nov 2014