Assignment #5 - Text File Analyzer
Due Date: Tues, Nov 7
Objectives
- To gain experience with file input/output techniques
- To gain experience with formatting output
- To practice overall problem-solving skills, as well as general
design of a program
Task
Write a program that reads a text file, then prints some statistical
results about its contents to an output file. The results will be
primarily an analysis of the characters in the file, which will include
the frequency of occurrence of certain categories of characters, as well
as the frequency of occurrences of each of the letters of the alphabet.
Note that in most standard English text, certain letters typically
appear with a higher frequency than others. This kind of information
has been used in setting up games like Scrabble (letter frequencies and
point values, etc), as well as more serious applications like decoding
encrypted messages and cyphers.
Details and Requirements
- Start by asking the user to input the name of the input file, and
then to input the name of an output file. Whenever the user enters a
filename that cannot be opened (both in the input and output cases),
print an error message and ask the user to re-enter. (We've seen
several code examples in the course notes that illustrate this).
The input file can be ANY file containing plain text.
- Read the contents of the input file, and then print to the OUTPUT
file the following information about the contents of the input file:
- A header stating a general heading and the name of the input
file
- The total number of characters contained in the file
- A chart (with headings) where each row lists a category, the number
of occurrences of that category of character, and the percentage of
the total file this makes up. These are the catgegories:
- Letters
- White space
- Digits
- Other
Note that the percentages in this chart should add up to 100%
- A heading "LETTER STATISTICS"
- Another chart (again, with headings), listing a category, the
number of occurrences of that letter type, and the percentage of
all LETTERS this comprises. This time, the categories are Uppercase
letters, lowercase letters, and then each fo the individual letters
of the alphabet (i.e. 26 of them). So this chart will have 28 rows
- Note that the percentage of uppercase + lowercase should add up
to 100%
- Also, the percentages of the 26 alphabet letters should add up to
100%
- Specifically, note that these are percentages of the total number
of letters, NOT the total number of characters in the file
- A heading "NUMBER ANALYSIS", followed by the count, sum, and
average (to 2 decimal places) of all integer numbers appearing in
the file, where a
number is defined as any consecutive sequence of digits bounded by
non-digits. Examples:
I am 14 years old and it is now 11:15 AM and my IP address is
123.45.0.204
In the above passge, there are 7 "numbers" (14, 11, 15, 123, 45, 0, and
204)
- All percentages should be printed to two decimal places, along with
a space and a % sign afterwards. Example: 12.45 %
- Your headings and category labels should match mine (see example
output files below). You may use whatever field widths you like, as
long as your charts line up in neat columns (i.e. your spacing doesn't
have to match mine exactly)
- Category labels on your chart should be lined up on the left side of
the words. All numbers in your charts should be lined up on the right
side.
- Hint: Instead of declaring 26 different variables to count each
letter, consider using an array of counters
General Requirements
- No global variables
- You may use any of these libraries (no others):
- iostream
- iomanip
- fstream
- cctype
- Write your source code so that it is readable and
well-documented
- Part of assessing readability and style will be how well you break
your program into appropriate functions. Note: Breaking it up
into functions makes for smaller, easier-to-code segments. You'll
make your work easier if you do so.
Sample input files and results
Example 1
Sample input file - file1.txt:
Hello. How are you?
I am 4 years old. My favorite color is blue.
Next year I will be 5. My IP address is 128.4.0.46
Sample program execution
(underline denotes user input)
Please enter the name of the input file.
Filename: file1.txt
Please enter the name of the output file.
Filename: out1.txt
Processing complete
Statistics for file: file1.txt
-------------------------------------------------
Total # of characters in file: 119
Category How many in file % of file
----------------------------------------------------------------------
Letters 74 62.18 %
White space 28 23.53 %
Digits 9 7.56 %
Other characters 8 6.72 %
LETTER STATISTICS
Category How many in file % of all letters
----------------------------------------------------------------------
Uppercase 9 12.16 %
Lowercase 65 87.84 %
a 6 8.11 %
b 2 2.70 %
c 1 1.35 %
d 3 4.05 %
e 9 12.16 %
f 1 1.35 %
g 0 0.00 %
h 2 2.70 %
i 7 9.46 %
j 0 0.00 %
k 0 0.00 %
l 7 9.46 %
m 3 4.05 %
n 1 1.35 %
o 7 9.46 %
p 1 1.35 %
q 0 0.00 %
r 6 8.11 %
s 5 6.76 %
t 2 2.70 %
u 2 2.70 %
v 1 1.35 %
w 2 2.70 %
x 1 1.35 %
y 5 6.76 %
z 0 0.00 %
NUMBER ANALYSIS
Number of integers in the file: 6
Their sum: 187
Their average: 31.17
Example 2
Sample input file - file2.txt:
The quick brown fox jumped over the lazy dog.
How lazy was he? Well, he was pretty darn lazy, I'll tell you.
And speaking of the fox, why was he jumping over dogs anyways?
He just leaped his 123rd dog.
Didn't he have anything better to do with his time?
When he checked his watch at 12:45 PM, 15 more foxes jumped over
the lazy dog, who so far had dreamed of chasing 100 sheep, 17 cats,
2 rabbits, and 13 mailmen.
Sample program execution
(underline denotes user input)
Please enter the name of the input file.
Filename: file2.txt
Please enter the name of the output file.
Filename: out2.txt
Processing complete
Statistics for file: file2.txt
-------------------------------------------------
Total # of characters in file: 417
Category How many in file % of file
----------------------------------------------------------------------
Letters 297 71.22 %
White space 85 20.38 %
Digits 17 4.08 %
Other characters 18 4.32 %
LETTER STATISTICS
Category How many in file % of all letters
----------------------------------------------------------------------
Uppercase 10 3.37 %
Lowercase 287 96.63 %
a 25 8.42 %
b 4 1.35 %
c 6 2.02 %
d 18 6.06 %
e 35 11.78 %
f 6 2.02 %
g 8 2.69 %
h 25 8.42 %
i 14 4.71 %
j 4 1.35 %
k 3 1.01 %
l 12 4.04 %
m 9 3.03 %
n 13 4.38 %
o 20 6.73 %
p 8 2.69 %
q 1 0.34 %
r 12 4.04 %
s 16 5.39 %
t 19 6.40 %
u 6 2.02 %
v 4 1.35 %
w 12 4.04 %
x 3 1.01 %
y 10 3.37 %
z 4 1.35 %
NUMBER ANALYSIS
Number of integers in the file: 8
Their sum: 327
Their average: 40.88
Submitting
Submit your program (Use the filename prog5.cpp) in the
usual way, with the submit5 script:
~myers/csub/submit5 prog5.cpp