Homework 2: Stats
Finding the mean and median of numerical data
Educational Objectives: After successfully completing this assignment,
the student should be able to accomplish the following:
- Use a loop structure to read user input of unknown size through std::cin and store it in an array.
- Use conditional branching to selectively perform computational tasks.
- Declare (prototype) and define (implement) functions.
- Declare and define functions with arguments of various types, including pointers, references,
const pointers, and const references.
- Call functions, making appropriate use of the function arguments and their types.
- Make decisions as to appropriate function call parameter type, from among:
value, reference, const reference, pointer, and const pointer.
- Compile and run a C++ program in the Unix/Linux environment using g++.
Operational Objectives:
Create a project that computes the mean and median of a sequence of
integers received via standard input.
Deliverables:
Four files: stats.h, stats.cpp, main.cpp, makefile.
Note that these files constitute a self-contained project.
Background
Given a finite collection of n numbers:
- The mean is the sum of the numbers divided by n, and
- The median is the middle value (in case n is odd) or the
average of the two middle values (in case n is even).
Note that to find the median of a collection of data, it is convenient to first
sort the data, that is, put the data in increasing (or non-decreasing)
order. Then the median is just the middle datum in the sorted sequence (or the
average of the two middle data, if there are an even number).
One of the simplest sort algorithms is called Selection Sort, which operates
on an array of elements and has a computation which
can be described in one sentence: For each element of the
array, find the smallest element with equal or higher index in the array and swap these two
elements. Here is a "pseudocode" description of the algorithm:
for i in [0...n) // for each element of array A
k = i // find the smallest element following it
for j in [i+1...n)
if A[j] < A[k]
k = j
endif
endfor // now A[k] is the smallest element following A[i]
swap the values in A[i] and A[k]
endfor
(You could test whether A[k] < A[i] before the swap, but it is not clear
this would speed up the process - swapping may be faster than testing.)
Procedural Requirements:
Create and work within a
separate subdirectory cop3330/hw2.
Review the COP 3330 rules found in Introduction/Work Rules.
Copy these files
LIB/hw2/makefile
LIB/hw2/hw2submit.sh
from the course distribution library into your project directory.
Create three more files
stats.h
stats.cpp
main.cpp
complying with the Technical Requirements and Specifications stated below.
Turn in four files stats.h, stats.cpp, main.cpp,
and makefile using the hw2submit.sh submit script.
Warning: Submit scripts do not work on the program and
linprog servers. Use shell.cs.fsu.edu to submit projects. If you do
not receive the second confirmation with the contents of your project, there has
been a malfunction.
Technical Requirements and Specifications
The project should compile error- and warning-free on linprog with the command
make stats.x.
The number of integers input by the user is not known in advance, except
that it will not exceed 100. Numbers are input through standard input, either
from keyboard or file re-direct. The program should read numbers until a
non-digit or end-of-file is encountered or 100 numbers have been read.
Once the input numbers have been read, the program should calculate the mean and
median and then report these values to standard output.
The source code should be structured as follows:
- Implement separate functions with the following prototypes:
float Mean (const int* array, size_t size); // calculates mean of data in array
float Median (int* array, size_t size); // calculates median of data in array
void Swap (int& x, int& y); // interchanges values of x and y
void Sort (int* array, size_t size); // sorts the data in array
- I/O is handled by function main(); no other functions should do any
I/O
- Function main() calls Mean() and Median()
- Function Median() calls Sort()
- Function Sort() calls Swap()
The source code should be organized as follows:
- Prototypes for Mean, Median, Sort, and
Swap should be in file stats.h
- Implementations for Mean, Median, Sort, and
Swap should be in file stats.cpp
- Function main should be in file main.cpp
The Sort() function should implement the Selection Sort
algorithm.
When in doubt, your program should behave like the distributed executable
examples in stats_i.x in area51. Identical behavior is not required, but the
general I/O behavior should be the same. In particular, the data input loop
should not be interupted by prompts for a next datum - this will make file
redirect cumbersome. Just ask for the data one time, then read until a non-digit
or end of file is encountered.
Hints
-
Sample executables are distributed in [LIB]/area51. These are named
stats_i.x and stats_s.x. The suffixes indicate which of the
two architectures the executable is compiled on: *_i.x runs on
Intel/Linux and *_s.x runs on Sun/Unix. (We may not always supply
the _s versions.)
-
To run a sample executable, follow these steps: (1) Decide which architecture
you want to use. The program machines are 32-bit Sun architecture running
Sun's version of Unix, and the linprog machines are Intel 64-bit
architecture running Linux. (2) Copy the appropriate executable into your space
where you want to run it. For example, if you are logged in to program
enter the command "cp [LIB]/area51/stats_s.x .". (3) Change permissions
to executable: "chmod 700 stats_s.x". (4) Execute by entering the name
of the executable. If you want to run it on a data file "data1", use
input redirect as in: "stats_s.x < data1". If you want the output to go to
another file, use output redirect: "stats_s.x < data1 > data1.out".
-
Test files can be created using the program ranint.cpp, which is distributed as
part of the assignment and is compiled by the supplied makefile. To create
random data files for testing, first build ranint.x with the command
make ranint.x
and then execute. Note that the program expects 3 command line arguments - (1)
file name, (2) upper bound on size of integers, and (3) number of elements to
generate.
It will remind you if you forget. Here are examples:
~/3330/hw2>ranint.x
** required arguments:
1: filename
2: upper bound on absolute size ('0' means no upper bound)
3: count of items
** try again
~/3330/hw2>
(Forgot to give arguments.)
~/3330/hw2>ranint.x d1 99 51
Results stored in file d1
range: -99 .. 98
count: 51
~/3330/hw2>ranint.x d2 99 52
Results stored in file d2
range: -99 .. 98
count: 52
~/3330/hw2>
- The less-than character in the command:
stats.x < data1
is a Unix/Linux operation that redirects the contents of data1
into standard input for stats.x. Using > redirects program output. For
example, the command:
stats.x < data1 > data1.out
sends the contents of data1 to standard input and then sends the
program output into the file data1.out. These are very handy operations
for testing programs.
-
It is sometimes simpler to develop the code in a single file (such as
project.cpp) that can be edited in one window and test-compiled with a
single command (such as g++ -Wall -Wextra -ostats.x project.cpp) and
split the file up into the deliverables after the initial round of testing and
debugging.
-
Note that the array in which input is stored is passed to the functions as a
pointer. In the case of Mean(), this pointer is const,
indicating that the elements of the array may not be changed by the
call. However in the case of Median(), the array element values are
allowed to change. These values are in fact changed by the call to
Sort().
- The function Sort() operates on the array input as a pointer. When
the function returns, the values of the array should be in increasing order.
- The selection sort algorithm requires a nested pair of loops
(one inside the other).
- Sorting the data is essential to calculate the median: when in an array that
is sorted, the middle (two) values are those contained in the middle (two) indices
of the array.
-
The middle index
of an array of n elements, when n is odd, is
[(n-1)/2]. The middle two indices, when n is even, are
[n/2 - 1] and [n/2].
- Be careful when subtracting 1 from an unsigned integer type such as size_t.
- Look at the code examples in Chapter 2 of the lecture notes to find simple
ways to structure your main I/O loop.