Lab 2 - Multiple-threaded Programming
COP4610/CGS5765, Introduction to Operating Systems, Fall 2003
Florida State University
Points: 100 points.
Due: Week 10, Tuesday, November 4, 2003.
Purpose
To know how to use mutliple-threaded programming to achieve efficiency and
structuredness and learn how to solve
concurrency problems among cooperating threads using
semaphores.
Background
As the email has become a reliable and convenient way for people
to communicate, some are also exploiting it for purposes beyond
reasonable usage.
As a result, unsolicited email (or, spam) is widespread and has
become a problem for regular email users.
To resolve the problem (or at least reduce the time to deal with
spam messages), tools have been developed to filter email messages.
The basic principle common to all message filtering is to automatically
estimate the likelihood a particular message is a spam based on the message's
content, its sender, its header, and so on using a set of rules.
Assignment
For this assignment, you are going to create a utility that can be used to
filter messages based on their content. Your program will search
messages for occurences of words specified.
For each message,
your program will report the frequency of specified words within
each message, and then based on the frequency of
specified words, save the message to a particular folder.
Specific program requirements:
- Your program should take a number of optional choices,
following a mailbox file to be filtered (or more than
one mailbox for the multiple mailboxes extra credit option).
The optional choices include:
- "-m" option. It enables
the synchronization, which is diabled by default.
When the synchronization is disabled, your program needs to exhibit
race conditions (i.e., the results will be wrong in some way).
When the synchronization is enabled, your program needs to work correctly.
- "-f filename" option. It specifies the filter words file to be used.
The default value for the filter words file
can be "filterwords.dat" or any other file that is convenient for you.
- "-c filename" option. It specifies the category file to be used.
The default value for the category file
can be "categorylist.dat" or any other file that is convenient for you.
- "-t numberthreads" option. It specifies how many consumer threads
(see below) to
be created in each searching process.
The default value is 2.
(If you are not comfortable with
dynamic memory allocation, you can hardcode it as the default value.)
- This is for the multiple mailboxes extra credit option only.
Each mailbox file specified on the command line will be handled by a
separate searching process which will be created by the main program.
- Each searching process will read the entire mailbox
file into an array that is shared among all the
threads and then create (numberthreads+1) threads,
where numberthreads is specified by the user using "-t" option (see above).
One thread (called producer thread) will
scans through the array, and find the beginning and end
of each email message in the mailbox and save the information in a shared
buffer.
The other numberthreads threads (called consumer threads)
will 1) read the information, i.e., the beginning and end position
from the shared buffer of a message;
2) match the words in the message to the ones specified in the filter words
file;
3) print the occurrences of each word in the message after the entire
message is compared;
4) save the message to a folder given by the category corresponding
to the word with the most occurrences. If no match is found, save
the message to the default folder (category 0). In order to
see race condition, your program should write one character at a time
to the folder.
- The shared message information
buffer between producer and consumer threads must be implemented as an array
of 2*numberthreads (default is 4) entries.
- The matching should be case-insensitive. In other words,
"SEMINOLE" and "seminole" are the same.
To simplify the implementation, you can assume that there is no
attachment to the messages in the mailboxes.
To satisfy the requirements and make it easier for you,
first you need to implement your program without synchronization and
your program should exhibit race conditions.
Then you need to add synchronization using
semaphores/mutual exclusion locks to
control access to the shared buffer and any other shared resources
such as files and the standard output device.
Submission
You need to sumbit in hardcopy the following items.
- Report -
You should turn in a report explaining how you have
implemented your utility and how you have handled the difficulties
you encountered. Your report must start with the following information:
- User name:
- Executable programs (with the path starting with your home directory):
- Extra credit options:
- Source code - You should also turn in all the
source programs
you developed for this lab along with your report.
You need to make sure that the copy you turned in is
exactly the same as the one you compiled and generated your results.
- Test case - You need to include the running results from
your program. You need to show two cases: one with race
conditions when the synchronization is disabled and
one without race condition (correct) when the synchronization is
enabled.
Extra Credit
- Phrases (10 %) - Your program should be able to
search phrases which may consist of several words. For example, your
program can search for "Florida State University".
You need to design your
matching algorithm very carefully.
- Multiple mailboxes (10 %) - Your main program should
handle multiple mailboxes at the same time using the same filter words file
and the category file. You need to do this by creating multiple processes
in the main program, each of which handles one mailbox.
You need to synchronize
processes using semaphores.
Grading:
- Report, Source Code, and Test Cases -- 30 points.
- Fail to include Source Code results in a penalty of 10 points.
- Fail to include test cases results in a penalty of 10 points.
- Correct Execution -- 70 points. Note the purpose of
this programming assignment is to learn how to coordinate multiple threads.
Your program may be tested several times and it must produce the desirable
results all the time. Specifically, each failure will incur the corresponding
penalty.
- Fail to show race condition without synchronization -- 30 points.
- Fail to use semaphores/mutual exclusion locks correctly
-- 30 points. This includes that there is no concurrency among
consumer threads.
- Fail to finish (such as run forever) -- 20 points.
- Fail to produce the correct results -- 20 points.
- Fail to create threads -- 20 points.
- Phrases - 10 points.
- Multiple mailboxes - 10 points.
Demonstration: For grading purpose, you will be required
to schedule a time slot to demonstrate your program to the TA.
One week's recitation sessions will be used for this demonstration purpose.
If you fail to demonstrate your program before the deadline (given later),
your program will be graded based on your source code and worst cases
will be assumed.
Test Cases
Test cases used for grading will be made available.
Additional Information:
- Your executable program must work correctly on "program".
- A demonstration program, ~liux/public_html/courses/cop4610/assignments/lab2 will be available. You need to run the program on "program".
- The following system calls/functions are helpful.
- fork, execve
- thr_create, thr_exit, thr_join
- mutex_init, mutex_lock, mutex_unlock, mutex_destroy
- shmget, shmop, shmctl, shmat
- semget, semctl
- pthread_create, pthread_join
- pthread_mutex_init, pthread_mutex_lock, pthread_mutex_unlock,
pthread_mutex_destroy
-
http://www.llnl.gov/computing/tutorials/workshops/workshop/pthreads/MAIN.html POSIX Threads Programming
-
http://dis.cs.umass.edu/~wagner/threads_html/tutorial.html
Pthreads Tutorial
-
http://pauillac.inria.fr/~xleroy/linuxthreads
Documentation on
The LinuxThreads Library
-
http://www.cs.ucsb.edu/~tyang/class/pthreads/index_sgi.html
Some examples on how to use the pthread library for making threaded programs
- Both filter words file and category file will use the following
format. Each line specifies one word / one phrase (extra credit only) /
category starting with a number, which means the category number associated
with the word or the category. Note that category is reserved as the regular
category. A valid filters word file can be found
at http://www.cs.fsu.edu/~liux/courses/cop4610/assignments/filterwords.dat
and a valid category file can be found at
at http://www.cs.fsu.edu/~liux/courses/cop4610/assignments/categorylist.dat.
- How to parse a mailbox
- The manual page of mail command (man mail) briefly describes the
format of a message.