Project 5:  Hash Table and Its Application

Due: 11/29/2023 (Part 1) and 12/08/2023 (Part 2)


Educational Objectives:  Understand and get familiar with the implementation and use of the hash table data structure.

Statement of Work: Implement a hash table ADT and develop a spell checker using the hash table.

Project Description:

This project has two parts. In the first part, you will implement a hash table class template named HashTable. In the second part of the project, you will develop a spell checker using the hash table you developed.

Task 1: Requirements of HashTable Class Template

Public HashTable interfaces (T is the template parameter, i.e., the generic data type)

Private HashTable interfaces (T is the template parameter)

You need to write test programs to test various functions of hash table. Moreover, you must implement the hashtable testing functions in the menu function in the incomplete myspell.cpp. You should complete myspell.cpp that provides the same IO interfaces as the sample executable (proj5.x) when it is executed with './proj5.x'. A sample testing input file, test_hashtable.txt, is provided. You can run './proj5.x < test_hashtable.txt > out' to see the expected output.

Task 2: A Spell Checker

In the second part of the project, you will develop a spell checker. The spell checker will take three command-line parameters: filename of the dictionary, name of the file to be checked (check file), name of the output file where the checked (and may be corrected) content should be written to.

Your spell checker will read the content of the check file line by line; one line at a time. After a line is read, your program will check word by word in the line if it is a valid word (i.e., if it is in the hash table), where a word is defined as a sequence of English letters from 'a' to 'z' (and the capital cases). When an invalid word is encountered (i.e., the word is not in the hash table), you will print out the line with the invalid word CAPITALIZED, you will then output 10 candidate words for the user to choose (to replace the wrong word). There are two issues that need to be decided for a spell checker: (1) what words are considered candidate words, and (2) which of the candidate words are to be presented to the user and how.

In this assignment, you will identify the candidate words in the following manner. For each character in the word, you will replace it with characters from 'a' to 'z', one at a time, starting from the left-most character. After a replacement is done, you will check if the new word is in the hash table. It becomes a candidate word if it is in the hash table. As an example, consider an invalid word "jest", we will create new words with replacement such as "aest", "best", "cest" ..., "zest" (the first letter "j" is replaced by one character (from 'a' to 'z'), one at a time), and "jast", "jbst", "jcst", ..., "jzst" (the second letter 'e' is replaced by 'a' to 'z', one at a time), etc. For each created word, you will check if it is in the hash table (a valid word), and if so, it is a candidate word.

The second issue is which of and how the candidate words are to be presented to the user. To have a reasonably good user experience, the spell checker will try to present more common words earlier to the user. This is achieved by utilizing a word rank file (wordrank.txt in the provided files). Each line in wordrank.txt contains a word and a number (rank). The smaller the rank, the more frequent the word is used. The word rank file is smaller than the dictionary and only contains a subset of words -- wordrank.txt only has a few thousand words. You must load the file and store it in a map in order to decide the rank of a word quickly. For each candidate word found, the spell checker should find its ranking. The spell checker should present the word with smaller rank earlier to the user. For candidate words not in the rank file, the one found earlier should be present earlier (the specific order of finding candidate words is described earlier). You must the consider all possible candidate words in order to identify the 10 candidate words in order.

The spell checker outputs the 10 candidate words to the user, who will choose one (or choose 'n' for no change). Note that each line may contain multiple invalid words, you need to process them one by one in the same manner. If there are less than 10 candidate words, you should just output these candidate words in order.

At the end of your program (after the spell check is finished), you should store the content of the text file, with invalid words corrected (or no change due to user selection), into the output file. You still need to write the content of the text file into the output file, even if all words in the original text file are valid.

A few notes for Task 2:

A partially implemented driver program named myspell.cpp is provided to you. This program works in two modes. When the program runs without any command-line parameters (try './proj5.x'), it will test the functions of the hash table. For this purpose, a menu() function is provided in myspell.cpp. You should call this function to display the operations that a user can perform on a hash table. You cannot change the menu() function.

In the second mode of the program, three command-line parameters will be provided (try './proj5.x /usr/share/dict/words test0.txt test0.out'), and the program will work as a spell checker.

Extra-credit oppurtunities (up to 30 points) 

A) (10 points) Incorporate a more advanced candidate word selection/presentation algorithm. To get the extra point, in addition to implementing your more advanced scheme, you must articulate and demonstrate why your scheme is better than the default scheme in this project. You must also implement the default scheme in order to get the extra points.

B) (10 or 20 points) Develop a nicer interface for the spell checker. You can implement a text-based interface like the UNIX ispell command for 10 extra points. Or you can develop a nice graphical user interface (GUI) for this project. Your GUI must support at least the user interfaces supported by the plain-text version user manual in the provided code. You can develop this part on Windows (10 bonus points) or you can work on a Linux machine such as linprog.cs.fsu.edu (20 bonus points). There are a number of choices for you to develop a GUI on a Linux machine, including Qt and GTKMM, and wxWidgets.

You must complete and submit the required components of the project on linprog if you decide to go for the GUI project for extra credits. You will not get any bonus points if you only have a GUI but cannot perform the required functions of this project.

Provided Partial Code

The following partial code has been provided to you.

1.  hashtable.h : partial implementation

2.  hashtable.hpp : partial implementation

3.  myspell.cpp: partial implementation

4. proj5.x: executable of myspell.cpp, compiled on linprog.cs.fsu.edu

5. wordrank.txt: a word rank file.

6. test0.txt, test1.txt, test2.txt: sample files to be checked.

For testing your program, you can use /usr/share/dict/words as the dictionary on linprog.cs.fsu.edu

Deliverables

1.  Deliverables: For Part 1, turn in all your source program files (hashtable.h, hashtable.hpp, myspell.cpp, and makefile to produce executable proj5.x) in a tar file via the Canvas system. It is due November 29, 11:59pm.

For Part 2, turn in all your source program files (hashtable.h, hashtable.hpp, myspell.cpp, and makefile to produce executable proj5.x) in a tar file via the Canvas system. The myspell.cpp program should include the spell check functionality. It is due December 8, 11:59pm.

Grading:

Part 1 (40 points):

Part 2 (65 points):

Hints: