Headers generally follow the form of /^[-a-zA-Z0-9_]+: .*$/ followed by zero or more continuation lines that start with whitespace (usually a tab) /\t.*/.
For instance, here's an email message:
$ cat /tmp/test/testfile1 Received: from mail.cs.fsu.edu (mail.cs.fsu.edu [128.186.120.4]) by newmail.cs.fsu.edu (Postfix) with ESMTP id 06476175D4C Received: by mail.cs.fsu.edu (Postfix) id 95D01F2DC4; Sat, 7 Jun 2008 03:54:40 -0400 (EDT) Delivered-To: langley Message-ID: <484A3E21.4090704@fsu.edu> Date: Sat, 07 Jun 2008 03:52:01 -0400 From: Tom Kitterman To: nolenet, OTC Help Desk Staff Subject: [Nolenet] Mailman listserv website down X-fsucs-MailScanner-SpamCheck: not spam, SpamAssassin (cached, score=-2.599, required 5, autolearn=not spam, BAYES_00 -2.60) X-Spam-Status: No Hi, There's something wrong with the mailman listserv website on lists.fsu.edu. This happened when we moved it to the new hardware. It's almost 4AM and I've run out of ideas on how to fix it at the moment so I'm going home to get some sleep and try again tomorrow. So for now that website is non-functional. The mailman list software is processing messages so this should mostly affect list owners. Until we get it fixed list owners should open a ticket through the help desk in the normal manner for any critical issues. Sorry for the inconvenience. Tom K. _______________________________________________ https://lists.fsu.edu/mailman/listinfo/nolenet -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
You can find on 45.56.74.139 a test source directory /usr/local/filter-source, which contains the following test email files:
/usr/local/filter-source$ ls -li total 1072 123537 -rw-r--r-- 1 root root 34240 Nov 22 15:46 file1 124500 -rw-r--r-- 1 root root 20085 Nov 22 15:45 file2 123099 -rw-r--r-- 1 root root 72006 Nov 22 15:45 file3 124499 -rw-r--r-- 1 root root 821938 Nov 22 15:45 file4 123543 -rw-r--r-- 1 root root 30543 Nov 22 15:45 file5 124533 -rw-r--r-- 1 root root 20260 Nov 22 16:57 file6 123541 -rw-r--r-- 1 root root 63112 Nov 22 15:45 file7 124536 -rw-r--r-- 1 root root 20605 Nov 22 16:57 file8
Your task is to write a program which accepts two options specified by -s and -d. The first option should let you specify a source directory like -s SOURCEDIRECTORY, and the second option should let you specify a destination directory like -d DESTINATIONDIRECTORY. Your program will then open the source directory specified by -s and examine all of the files at the first level (you don't have recurse into any subdirectories that you find) to see if the file has a subject header that indicates spam.
This is done by looking for the character strings [SPAM] or {SPAM} (capitalization matters) in the Subject: header (remember, headers are only found before the first blank line; file8 for instance has a line that matches at 266, but it is outside the headers area.)
If the Subject: header indicates spam, or if there is no Subject: header, then no further processing happens. This should happen for both test file file1 and file6, each of which has a subject line labeled as spam, but not for file8, which has a matching line, but the matching line occurs after the headers.
If the file is not spam and does contain a Subject: header, your program should create a file in the destination directory that has the same filename as the original file; the contents of the file should be only the body of the message, with none of the header lines at all.
Thus if you processed the above example /tmp/test/testfile like so:
$ cat /tmp/test/testfile1 Received: from mail.cs.fsu.edu (mail.cs.fsu.edu [128.186.120.4]) by newmail.cs.fsu.edu (Postfix) with ESMTP id 06476175D4C Received: by mail.cs.fsu.edu (Postfix) id 95D01F2DC4; Sat, 7 Jun 2008 03:54:40 -0400 (EDT) Delivered-To: langley Message-ID: <484A3E21.4090704@fsu.edu> Date: Sat, 07 Jun 2008 03:52:01 -0400 From: Tom Kitterman To: nolenet, OTC Help Desk Staff Subject: [Nolenet] Mailman listserv website down X-fsucs-MailScanner-SpamCheck: not spam, SpamAssassin (cached, score=-2.599, required 5, autolearn=not spam, BAYES_00 -2.60) X-Spam-Status: No Hi, There's something wrong with the mailman listserv website on lists.fsu.edu. This happened when we moved it to the new hardware. It's almost 4AM and I've run out of ideas on how to fix it at the moment so I'm going home to get some sleep and try again tomorrow. So for now that website is non-functional. The mailman list software is processing messages so this should mostly affect list owners. Until we get it fixed list owners should open a ticket through the help desk in the normal manner for any critical issues. Sorry for the inconvenience. Tom K. _______________________________________________ https://lists.fsu.edu/mailman/listinfo/nolenet -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. $ bin/filter.pl -s /tmp/test -d filter-results/ $ cat filter-results/testfile1 Hi, There's something wrong with the mailman listserv website on lists.fsu.edu. This happened when we moved it to the new hardware. It's almost 4AM and I've run out of ideas on how to fix it at the moment so I'm going home to get some sleep and try again tomorrow. So for now that website is non-functional. The mailman list software is processing messages so this should mostly affect list owners. Until we get it fixed list owners should open a ticket through the help desk in the normal manner for any critical issues. Sorry for the inconvenience. Tom K. _______________________________________________ https://lists.fsu.edu/mailman/listinfo/nolenet -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
When you run your program over the directory /usr/local/filter-source, the resulting destination directory should look like:
$ bin/filter.pl -s /usr/local/filter-source/ -d filter-results/ $ ls -li filter-results/ total 920 124528 -rw-r--r-- 1 COP4342_test COP4342_test 3178 Nov 22 17:47 file2 124505 -rw-r--r-- 1 COP4342_test COP4342_test 54705 Nov 22 17:47 file3 124529 -rw-r--r-- 1 COP4342_test COP4342_test 804616 Nov 22 17:47 file4 123542 -rw-r--r-- 1 COP4342_test COP4342_test 12916 Nov 22 17:47 file5 124527 -rw-r--r-- 1 COP4342_test COP4342_test 47131 Nov 22 17:47 file7 124502 -rw-r--r-- 1 COP4342_test COP4342_test 4879 Nov 22 17:47 file8
(Note that file1 and file6 are not there because they were discovered to be spam messages; note that the files that are in the results subdirectory are shorter than the originals since they no longer have any headers.)
Your Perl program be saved on your account on 45.56.74.139 in ~/bin/filter.pl so that I can test it. Please also submit the program on Blackboard by 11:59pm on Wednesday, November 30.