CIS3931 Intro to JAVA Programming

Assignment #3 – A Word/Letter Counter

 

Overview

 

Create a program that will read a file of words and output the total number of words, the number of unique words (case insensitive), and a list of the words sorted from most common to least common (give the number of occurrences for each unique word), also give the total number of letters followed by number and frequency of each letter of the alphabet. Write this information to the screen and to the destination file. Your output should match the sample output below. 

 

 

Details

 

The program will be run from the command line as follows :

Java Project3 source destination

where source is the file your program should read from and destination is the file your program should output to.

 

You should perform error checking in the beginning of your program to be sure that the user entered two options on the command line. If the user did not, the program should immediately exit and inform the user of the correct syntax for starting the program.

 

Your program should first read in the text file jabberwock.txt . It should then count the total number of words, the total number of unique words, create a list of works sorted from most common to least common (give the number of occurrences of each unique word), and also give the total number of letters followed by number and frequency of each letter of the alphabet. This data should be displayed to both standard output and printed to a text file.

 

Your program must be able to handle file IO errors. If there is a problem with the source file or the destination file, report the error to the user and exit the program cleanly. Points will be deducted if your program crashes on input or output.

 

Your program should use multiple methods where necessary. Examples of this will be given in the “Program Flow” section.

 

Please note that this program should be CASE INSENSITIVE. This means that “The” and “the” are not unique words. They should be counted as the same word.

 

 

 

 

 

Program Flow

 

The following is a suggested flow for your program. You are welcome to follow this example or create the program however you wish.

 

·      First, check to be sure the user entered two options (one for source and one for destination) on the command line. If the user did not, report the error and inform the user of the correct syntax. The program should exit if the syntax was not correct. If it is correct, the program should continue execution.

·      Try to create the destination file. This should be done with a printwriter inside of a try/catch block.

·      Create your variables to store the needed data. You will need the following at minimum:

·      Map (TreeMap) to store the unique word / frequency associations (discussed in class)

·      Integer array to store the letter / frequency associations

·      Integer to store total number of letters

·      Integer to store total number of words

 

·      Next, read in the text file as specified in the source variable on the command line and convert the file into a string. You will want to use try and catch blocks for this operation to avoid crashes. It would be wise to do this portion of the program in a method. The following is an partially complete example (note : the only part missing is the declaration of the buffered reader…)

 

// Turn the contents of a text file into a String

// This method takes in the name of the file to read from

// This method returns the string of characters from the file

                                                                                                        

    public static String fileToString(String fileName)

    {

        //Read in input file ... store to character array                                                                                                                                                                                                                  

        char[] charArray = new char[(int) (new File(fileName)).length()];

        try

        {

           //INSERT CODE HERE TO CREATE BUFFERED READER CALLED “IN”

            in.read(charArray);

            in.close();

        }

        catch (IOException e)

        {

            throw new RuntimeException(e);

        }

        return (new String(charArray));

    }

 

·      Toss the string into a pattern matcher (discussed in class). This will grab all of the words out of the String that you created from the source file and store them in a matcher object.

·      Use the matcher object to parse through each word. With each word, you will need to do the following :

§      Convert it to lower case.

§      Add 1 to the total number of words integer.

§      Counter the number of letters in the word and add that to the total letters integer.

§      Count each individual letters instance and add that to the corresponding position in the integer array.

§      Determine if the word is unique. If it is, add it to the TreeMap. If it isn’t, increment the value of the word’s key by one (discussed in class).

·      You will now need to sort the entries by decending order of occurrences of the work. This would be performed as follows (assuming your original map was called “map” :

       

// Get the map’s entries

                                             

Map.Entry[] entries = (Map.Entry[]) map.entrySet().toArray(new Map.Entry[0]);

       

// sort the entries by descending order of occurrences of the word                                                                                   

Arrays.sort(entries, new Comparator()

 {

  public int compare(Object o1, Object o2)

   {

     return(((Integer) ((Map.Entry) o2).getValue()).intValue()  -

                           ((Integer) ((Map.Entry) 

                             o1).getValue()).intValue());

    }

});

·      At this point you will have all the necessary data to create your output. Please note that the above function created a new Map called “entries”. This is the sorted map that you will use when outputting your data.

·      Output the total word count

·      Output the total number of unique words

·      Output each word and its occurrences

·      Write the letter frequency information. The following is one way to do this (assuming your integer array was named “letterCount” and your total number of letters integer was called “numLetters”) :

// Create a number formatter object         

NumberFormat pf = NumberFormat.getPercentInstance();

       

//Set decimal formatting for percentages                                           

pf.setMinimumFractionDigits(2);

pf.setMaximumIntegerDigits(0);

 

system.out.println("\nTotal letter count: " + numLetters);

 

//Output letters, occurrences, and percentages  

for (int i = 0; i <= ('z' - 'a'); i++)

    {

   double percent = (double) letterCount[i] / numLetters;

   System.out.println((char) (i + 'a') + " : " +

          intFormat(letterCount[i], 3) + " : " +

          intFormat((int) (100 * percent), 2) +  

          pf.format(percent));

     }

 

·     Note that you will need the following intFormat method for the above code to work …

  

// format (prepend with spaces) an integer to be a specific width                                                                                        

 

    static String intFormat(int integer, int width)

    {

        String result = "";

        int i = width - 1;

        while ((i > 0) && (integer < Math.pow(10, i--)))

        {

            result += " ";

        }

        return (result += integer);

    }

·      Close your output file!!

 

 

Actual Program Output

 

The following is the output of the program when ran with the jabberwock.txt input. Please format your program so that it outputs EXACTLY as follows :

 

Total word count : 167
Number of unique words : 91
the : 19
and : 14
he : 7
in : 6
jabberwock : 3
my : 3
through : 3
all : 2
as : 2
beware : 2
borogoves : 2
brillig : 2
came : 2
did : 2
gimble : 2
gyre : 2
his : 2
it : 2
mimsy : 2
mome : 2
one : 2
outgrabe : 2
raths : 2
slithy : 2
stood : 2
that : 2
thought : 2
toves : 2
twas : 2
two : 2
vorpal : 2
wabe : 2
went : 2
were : 2
with : 2
arms : 1
awhile : 1
back : 1
bandersnatch : 1
beamish : 1
bird : 1
bite : 1
blade : 1
boy : 1
burbled : 1
by : 1
callay : 1
calloh : 1
catch : 1
chortled : 1
claws : 1
come : 1
day : 1
dead : 1
eyes : 1
flame : 1
foe : 1
frabjous : 1
frumious : 1
galumphing : 1
gree : 1
hand : 1
has : 1
head : 1
its : 1
jaws : 1
joy : 1
jujub : 1
left : 1
long : 1
manxome : 1
o : 1
of : 1
rested : 1
shun : 1
slain : 1
snack : 1
snicker : 1
so : 1
son : 1
sought : 1
sword : 1
thou : 1
time : 1
to : 1
took : 1
tulgey : 1
tumtum : 1
uffish : 1
whiffling : 1
wood : 1
 
Total letter count: 715
a :  61 :  8.53%
b :  30 :  4.20%
c :  16 :  2.24%
d :  33 :  4.62%
e :  80 : 11.19%
f :  10 :  1.40%
g :  22 :  3.08%
h :  61 :  8.53%
i :  37 :  5.17%
j :   8 :  1.12%
k :   7 :  0.98%
l :  30 :  4.20%
m :  26 :  3.64%
n :  36 :  5.03%
o :  52 :  7.27%
p :   3 :  0.42%
q :   0 :  0.00%
r :  33 :  4.62%
s :  38 :  5.31%
t :  65 :  9.09%
u :  21 :  2.94%
v :   6 :  0.84%
w :  23 :  3.22%
x :   1 :  0.14%
y :  16 :  2.24%
z :   0 :  0.00%

 

 

Grading

              

5 points – Name, date, and assignment number on top of program

20 points – Proper commenting

15 Points – Programming style

30 Points – Proper user input checking and handling of IO Exceptions

30 Points – Program produces correct values on output to both screen and text fil0065

 

Total : 100 points

 

 

Extra Credit Opportunities

 

+10 : Your program will be outputting to stdout and to a file. You should create a method that performs BOTH of these functions so that 

 you do not have to write a System.out.println and a output.write manually for each output line.

 

 

Grading explanation

 

Name, date, assignment number

   Your program should have the following header format

   /*

   Name : {YOUR NAME HERE}

   Date : {PROGRAM SUBMISSION DATE}

   Assignment : {ASSIGNMENT NUMBER AND TITLE}

   */

 

Proper commenting

   Your program should be properly commented. This means that a person who is unfamiliar with the JAVA programming language should be able to understand what each line/section of your code is doing by simply reading your comments. There is no such this as too much commenting, so feel free to comment as much as you want.

 

Programming style

   This is a catch all for many things. Your program should be indented correctly. The code should be easy for the grader to read. You should optimize your code so that you are meeting the objectives in as little lines of code as possible. You should attempt to use the most advanced programming technique possible to complete the objective. For example, use loops instead of manually repeated the same block of code. Your variable names should be clear and concise.

 

Proper user input checking

   Self explanatory.

 

Program produces correct values on test cases

   Be sure to check your program to be sure it works on all imaginable cases. Just because it works for a few numbers doesn’t mean it will work for everything. We will be testing your program will a set of numbers that are designed to test every aspect of the programming objective. Please note : we are not expecting you to check for rounding errors. All values will be given a window of + or - 1 cent.

 

 

 

Program due date and submission instructions

 

This program is due no later than Thursday, 16th of June, 2005.

You may turn in the program on the 16th of June for 10 points extra credit. Otherwise, the program is due on Tuesday, 21st of June 2005.

 

The program should be named Assignment3YOURFULLNAME.java

(Example : Assignment3RobertThornton.java)

 

You are to submit your source code (your .java file) to cis3931 at cs dot fsu dot edu

 

For assignment clarifications, please e-mail thornton at psy dot fsu dot edu

 

For help with your program, please e-mail cis3931 at cs dot fsu dot edu