Assignment #5 - Text File Analyzer
Due Date: Mon, Nov 7
Objectives
- To gain experience with file input/output techniques
- To gain experience with formatting output
- To practice overall problem-solving skills, as well as general
design of a program
Task
Write a program that reads a text file, then prints some statistical
results about its contents to an output file. The results will be
primarily an analysis of the characters in the file, which will include
the frequency of occurrence of certain categories of characters, as well
as the frequency of occurrences of each of the letters of the alphabet.
Note that in most standard English text, certain letters typically
appear with a higher frequency than others. This kind of information
has been used in setting up games like Scrabble (letter frequencies and
point values, etc), as well as more serious applications like decoding
encrypted messages and cyphers.
Details and Requirements
- Start by asking the user to input the name of the input file, and
then to input the name of an output file. Whenever the user enters a
filename that cannot be opened (both in the input and output cases),
print an error message and ask the user to re-enter. (We've seen
several code examples in the course notes that illustrate this).
The input file can be ANY file containing plain text.
- Read the contents of the input file, and then print to the OUTPUT
file the following information about the contents of the input file:
- A header stating a general heading and the name of the input
file
- The total number of characters contained in the file
- A chart (with headings) where each row lists a category, the number
of occurrences of that category of character, and the percentage of
the total file this makes up. These are the catgegories:
- Letters
- White space
- Digits
- Other
Note that the percentages in this chart should add up to 100%
- A heading "LETTER STATISTICS"
- Another chart (again, with headings), listing a category, the
number of occurrences of that letter type, and the percentage of
all LETTERS this comprises. This time, the categories are Uppercase
letters, lowercase letters, and then each fo the individual letters
of the alphabet (i.e. 26 of them). So this chart will have 28 rows
- Note that the percentage of uppercase + lowercase should add up
to 100%
- Also, the percentages of the 26 alphabet letters should add up to
100%
- Specifically, note that these are percentages of the total number
of letters, NOT the total number of characters in the file
- All percentages should be printed to two decimal places, along with
a space and a % sign afterwards. Example: 12.45 %
- Your headings and category labels should match mine (see example
output files below). You may use whatever field widths you like, as
long as your charts line up in neat columns (i.e. your spacing doesn't
have to match mine exactly)
- Category labels on your chart should be lined up on the left side of
the words. All numbers in your charts should be lined up on the right
side.
- Hint: Instead of declaring 26 different variables to count each
letter, consider using an array of counters
General Requirements
- No global variables, other than constants
- You may use any of these libraries:
- iostream
- iomanip
- fstream
- cctype
- Write your source code so that it is readable and
well-documented
- Part of assessing readability and style will be how well you break
your program into appropriate functions. Note: Breaking it up
into functions makes for smaller, easier-to-code segments. You'll
make your work easier if you do so.
- Your program should only use standard ANSI header files (make sure
to follow the directions exactly on the handout for creating Visual C++
projects, so that Windows-specific headers like stdafx.h and conio.h are
not placed into your file)
Sample input files and results
Example 1
Sample input file - file1.txt:
Hello. How are you?
I am 4 years old. My favorite color is blue.
Next year I will be 5.
Sample program execution
(underline denotes user input)
Please enter the name of the input file.
Filename: file1.txt
Please enter the name of the output file.
Filename: out1.txt
Processing complete
Statistics for file: file1.txt
-------------------------------------------------
Total # of characters in file: 90
Category How many in file % of file
----------------------------------------------------------------------
Letters 61 67.78 %
White space 22 24.44 %
Digits 2 2.22 %
Other characters 5 5.56 %
LETTER STATISTICS
Category How many in file % of all letters
----------------------------------------------------------------------
Uppercase 6 9.84 %
Lowercase 55 90.16 %
a 5 8.20 %
b 2 3.28 %
c 1 1.64 %
d 1 1.64 %
e 8 13.11 %
f 1 1.64 %
g 0 0.00 %
h 2 3.28 %
i 5 8.20 %
j 0 0.00 %
k 0 0.00 %
l 7 11.48 %
m 2 3.28 %
n 1 1.64 %
o 7 11.48 %
p 0 0.00 %
q 0 0.00 %
r 5 8.20 %
s 2 3.28 %
t 2 3.28 %
u 2 3.28 %
v 1 1.64 %
w 2 3.28 %
x 1 1.64 %
y 4 6.56 %
z 0 0.00 %
Example 2
Sample input file - file2.txt:
The quick brown fox jumped over the lazy dog.
How lazy was he? Well, he was pretty darn lazy, I'll tell you.
And speaking of the fox, why was he jumping over dogs anyways?
Didn't he have anything better to do with his time?
When he checked his watch at 12:45 PM, 15 more foxes jumped over
the lazy dog, who so far had counted 100 sheep, 17 cats, and 2 rabbits.
Sample program execution
(underline denotes user input)
Please enter the name of the input file.
Filename: file2.txt
Please enter the name of the output file.
Filename: out2.txt
Processing complete
Statistics for file: file2.txt
-------------------------------------------------
Total # of characters in file: 362
Category How many in file % of file
----------------------------------------------------------------------
Letters 261 72.10 %
White space 73 20.17 %
Digits 12 3.31 %
Other characters 16 4.42 %
LETTER STATISTICS
Category How many in file % of all letters
----------------------------------------------------------------------
Uppercase 9 3.45 %
Lowercase 252 96.55 %
a 21 8.05 %
b 4 1.53 %
c 6 2.30 %
d 14 5.36 %
e 30 11.49 %
f 5 1.92 %
g 6 2.30 %
h 22 8.43 %
i 11 4.21 %
j 3 1.15 %
k 3 1.15 %
l 10 3.83 %
m 6 2.30 %
n 12 4.60 %
o 19 7.28 %
p 7 2.68 %
q 1 0.38 %
r 10 3.83 %
s 13 4.98 %
t 19 7.28 %
u 6 2.30 %
v 4 1.53 %
w 12 4.60 %
x 3 1.15 %
y 10 3.83 %
z 4 1.53 %
Submitting
Submit your program (Use the filename prog5.cpp) in the
usual way, through the web page.