Project 3: WordSmith
Educational Objectives:
On successful completion of this assignment, the student should be able to
- Define the concept of associative container as a re-usable component in programs
- State the distinction between unimodal and multimodal associative containers, and:
- Give examples of each type
- Describe use cases making each type appropriate
- State the distinction between ordered and unordered associative containers
- Give examples of each type
- Describe use cases making each type appropriate
- Name the primary defining operations for associative containers of these types:
- Unimodal Ordered Set
- Multimodal Ordered Set (aka Ordered Multiset)
- Unimodal Unordered Set
- Multimodal Unordered Set (aka Unordered Multiset)
- Unimodal Ordered Map (aka Ordered Table)
- Multimodal Ordered Map (aka Ordered Multimap)
- Unimodal Unordered Map (aka Unordered Table)
- Multimodal Unordered Map (aka Unordered Multimap)
- Associative Array
Describe the behavior and state the runtime expectations for each operation.
- Describe various implementation plans for ordered associative containers,
and discuss whether and why runtime expectations are met by the implementation.
- Define "leggy" and "bushy" for a binary search tree (BST), and give
examples to illustrate the concepts.
- Give an argument that red-black trees are always "bushy".
- Give complete details on implementing red-black binary trees
- Argue why and how red-black binary trees provide an implementation for
ordered set, including runtime expectations for the principal operations
- Implement ordered set as red-black tree, including testing for functionality
and structural integrity
Background Knowledge Required: Be sure that you have mastered the
material in these chapters before beginning the project:
Binary Trees and Iterators,
Binary Tree Construction,
Associative Containers,
Sets and Maps 1,
Associative Binary Trees,
Operational Objectives:
Create (1) a client WordSmith of CSet<> that serves as a text
analysis application; and (2) an implementation of Binary Search Tree TBST<>
that serves as an implementation platform for CSet<>.
Deliverables:
wordsmith.h,
wordsmith.cpp,
tbst.h,
makefile,
log.txt.
Procedural Requirements
Begin by copying all of the files from the project distribution directory:
proj3/main.cpp # driver program for wordsmith
proj3/makefile # makefile for project
proj3/data? # sample word files
proj3/ftmolist.cpp # ftoac.cpp with Container = TMOList
proj3/ftuolist.cpp # ftoac.cpp with Container = TUOList
proj3/ftbst.cpp # ftoac.cpp with Container = TBST
proj3/proj3submit.sh # submit script
tests/ftoac.cpp # functionality test for ordered associative containers
tests/fcset.cpp # functionality test for CSet
Define and implement the classes TBST<T,P> and TBST<T,P>::Iterator
in the file
tbst.h. Also place all supporting definitions and
implementations, such as operator overloads, in this file.
Be sure to fully cite all references used for code and ideas, including
URLs for web-based resources. These citations should be in the file
documentation and if appropriate detailed in relevant code locations.
Test your classes using the distributed test harnesses fcset.cpp
and ftbst.cpp.
Write a brief description of your development and test methods and
results and place this in the file header documentation of tbst.h.
Submit the project using the script proj3submit.sh.
Warning: Submit scripts do not work on the program and
linprog servers. Use shell.cs.fsu.edu to submit projects. If you do
not receive the second confirmation with the contents of your project, there has
been a malfunction.
Project Overview: The project consists of two orthogonal
tasks: (1) creation of the WordSmith application, which is a client of
fsu::CSet, and (2) creation of the BinarySearchTree
associative container used to efficiently implement fsu::CSet. These
tasks are discussed seperately below.
- The WordSmith Client
- Functionality Requirements.
- WordSmith can read an arbitrary text file on command and extract all of the
words in the file, maintaining the unique words, along with the frequency of
occurrence of each word, in a set. Letters are converted to lower case before
comparison and storage. A word is understood to be a string of letters and/or
digits, with certain other symbols allowed. Most non-alpha-numeric characters
are ignored. Exceptions are hyphens and apostrophes, which are considered part
of the word, so that contractions and hyphenated constructs are counted as
individual words. (Note: two adjacent apostrophes are not considered part of
a word, since they represent closing of a quotation.)
- WordSmith can write an analysis of its current stored words. This analysis
consists of a lexicographical listing of the unique words together with their
frequencies, followed by a count of the total number of words and the vocabulary
size (number of unique words). Note that this is a cumulative analysis over all
of the input files read since starting up TA (or since the last clearing
operation).
- WordSmith must operate with the supplied driver program
LIB/proj3/main.cpp which has a user interface with the following options:
- Read a file. Read the words of the file into the structure
(and report summary to screen).
- Write an analysis of the current data (including input file names) to a
file (and report summary to screen).
- Erase current data and clear all data from the structure.
- Show current size and send a data summary to the screen.
- display Menu.
- eXit program.
Use the source code in the driver program main.cpp to determine the
syntax requirements for the WordSmith public interface. Use the
executables in area51 to model expected behavior.
-
From any directory having access to the course library and containing your
submission files, entering "make" should result in an executable called
"wordsmith.x".
- Implementation Requirements.
- You should define a class WordSmith, declared in the file
wordsmith.h and implemented in the file wordsmith.cpp. An
object of type WordSmith is used by the driver program to create
the executable wordsmith.x.
- The primary data structure used for storing words and wordcounts should be
an object of type
fsu::CSet < EntryType , ContainerType >,
where ContainerType is an associative container and
EntryType is typedef'd as fsu::TPair<fsu::String, unsigned long>. Note
that an EntryType object holds a word and a wordcount.
- Note that the fsu::Pair template class has comparison operators
defined that emphasize the first coordinate of the pair (called the "key"), so that two pairs
are considered equal, for example, if they have equal keys.
- The structure used for storing file names should be an object of type
fsu::TList<fsu::String>.
- The application should function correctly in every respect using
fsu::TUOList < EntryType > for ContainerType.
- Changing the structure used for ContainerType should be as simple as
changing one typedef statement in the WordSmith class
declaration.
- As usual, you should employ good software design practice. Your application
should be completely robust and all classes you define should be thoroughly
tested for correct function, robust behavior, and against memory leaks. Your wordsmith.x
should mimic, or improve upon, the behavior illustrated in area51/wordsmith_?.x.
- The Binary Search Tree Container
-
The container class TBST<T,P> should be declared and implemented in the file tbst.h
- TBST<T,P> may either be derived from
TBinaryTree<T> [inheritance], use a private
TBinaryTree<T> object [adaptation], or developed stand-alone. The template parameter
P should have a default value P = TLessThan<T>.
- TBST<T,P> should be a proper type and implement the interface
illustrated in Chapter 16, Slide 5.
-
The following methods should have runtime O(d) where d
is the depth of the tree:
Iterator Insert (const T& t); // insert in order
Iterator LowerBound (const T& t) const; //
Iterator UpperBound (const T& t) const;
Iterator Includes (const T& t) const; // returns LowerBound() or End()
size_t Remove (const T& t); // remove (all copies of) t
-
The following methods should have constant runtime O(1):
bool Insert (Iterator& i, const T& t); // insert only if location is correct
bool Remove (Iterator& i); // remove item at i
- All insertion methods should use type "U" semantics.
- TBST<T,P> should compile and function correctly with the
client proj3/ftbst.cpp [tests/ftoac.cpp set to case Cd: uni-bst].
- Scoring
- Level 1 (80 points):
WordSmith based on CSet < EntryType , TUOList < EntryType >
>
- Level 2 (100 points): level 1 plus
WordSmith based on CSet < EntryType , TBST < EntryType >
>
- Level 3 (120 points): level 2 plus
TBST < T , P > passes all tests using
ftbst.cpp
Hints
-
Note that LIB/proj3/ that contains the
file main.cpp along with some test data files.
Additional helpful files are: the ordered associative container test harness LIB/tests/ftoac.cpp,
sample executables LIB/area51/wordsmith_?.x, and the submit script
LIB/proj3/proj3submit.sh.
-
Develop WordSmith first using a sorted-list-based set. Since the
supporting data structures are already in the library, this will allow you to
concentrate on the application. Also, this allows you to develop a large part
of the project before dynamic trees are covered in class. This gets you to the
80 point level.
- Make liberal use of typedef statements in class
WordSmith. This will make your code easier to understand, generate more
efficient identifier names in your object code, and make it much easier to
change from one aContainer to another in the class declaration.
-
Once WordSmith is fully functional and tested, you can begin thinking about
developing the binary search tree container. Writing TBST<T,P>
as an adaptor of TBinaryTree<T> will save a lot of work: you shouldn't need to do
any "raw" inserting of tree nodes, rather you can use the various insert
operations of the base class. This is modelled closely by the way TUOList<T,P>
adapts TList<T> (code for which is in LIB/tcpp).
-
Only the methods TBST<T,P>::Insert(const T& t) and
TBST<T,P>::Includes(const T& t) (or
TBST<T,P>::LowerBound(const T& t)) need to be implemented for
WordSmith. This gets you to the 100 point level.
-
Implementation and testing of the full TBST<T,P> public
interface gets you to the 120 point level. Note this is 20 points extra credit.