Two of the most popular services visibly provided by servers are email and web-type services.
Full email setups generally consists of an MTA such as sendmail or postfix, a delivery agent such as procmail or dropmail, a pop/imap server, and perhaps a webmail interface such as openwebmail, Outlook Web Access (OWA), horde, or squirrelmail.
They may also include various spam and virus programs, such as MailScanner, spamassassin, avis, clamav, dcc, razor, and many others, and other mail types of mail filters such as the popular milter library programs (e.g., milter-ahead).
Web services generally center around an Apache web server, some CGI-friendly regime such as Perl (anywhere from embedded Perl to mod_perl with any of the numerous CGI packages), Python, PHP, Ruby, JSP, ASP, and a database such as MySQL, Postgresql, Oracle, or SQLite. It may also include other bits such as SOAP or RSS services.
Sendmail functions as a MTA (and also a RFC 2476 MSA). It is generally configured to listen to port 25 (and 587 for MSA functions), and the configuration files are now generally stored in /etc/mail.
The primary configuration for administrators typically is /etc/mail/sendmail.mc This contains m4 directives to control the creation of /etc/mail/sendmail.cf
Sendmail is quite powerful. A common application for sendmail is to serve as a gateway mail server.
(You can also do this type of thing with Exchange; see Microsoft's website for a document called ``Using a Windows SMTP Relay Server in a Perimeter Network'' which gives an overview, and for details, look at ``How to Configure a Windows Server 2003 Server as a Relay Server or Smart Host''.)
One quite clever idea came from MailScanner's author, Julian Field at the University of Southampton. Email going into sendmail is put into a queue, and instead of the usual process of another sendmail process acting as a queue handler to deliver it, MailScanner first processes the mail (looking for spam and viruses, and comparing it against blacklists and whitelists), and then enqueues the message into a different queue directory for the second sendmail queue handler to find. (You can often view mail queues with the alias ``mailq'' which actually is ``sendmail -bp'' (or postfix's ``postqueue -f''.)
As we saw from the .mc files, sendmail doesn't actually do local delivery of email. Ordinary delivery is typically by procmail (other candidates include the old binmail program or dropmail.
procmail is a very powerful mail delivery agent; it can be configured to do many, many things. See http://www.procmail.org for ``recipes''. For instance, a typical procmail recipe might look like:
:0 * ^From: unpleasant@user.com /dev/null :0: ${DEFAULT}
Headsup: procmail is very picky about such items as colons. A single missing colon can be very bad since it might be one that indicates that a mailbox is to be locked before it receives a delivery -- and failing to lock a shared mailbox file might prove unpleasant.
Finally, you have to decide one (or perhaps two more) things about delivery: do you want email to go into a traditional ymbox, which is just one long file of email separated by the delimiter ^From .*\n or do you want to use the more modern maildir approach, where each email is written to a separate file? I think that the latter is preferable. If you do choose to go with mbox format, you will also have to make sure that your locking mechanisms for procmail, imap/pop, and any other client software such as openwebmail all agree to a common locking mechanism.
Maildirs are safer in many ways that the traditional mbox format. On USAH p. 549, the problems with traditional mailbox locking are discussed, as they are on the maildir webpage.
Maildirs keep every email message in a separate file, and never use any type of locking mechanism.
Traditional mailbox (mbox) format is not safe over NFS.
Every maildir setup will have the subdirectories tmp, new, and cur, and may have others. Mail is first delivered to tmp, then safely moved to new. It may have others, also.
HOW A MESSAGE IS DELIVERED The tmp directory is used to ensure reliable delivery, as discussed here. A program delivers a mail message in six steps. First, it chdir()s to the maildir directory. Second, it stat()s the name tmp/time.pid.host, where time is the number of seconds since the beginning of 1970 GMT, pid is the program's process ID, and host is the host name. Third, if stat() returned anything other than ENOENT, the program sleeps for two seconds, updates time, and tries the stat() again, a limited number of times. Fourth, the program creates tmp/time.pid.host. Fifth, the program NFS-writes the message to the file. Sixth, the program link()s the file to new/time.pid.host. At that instant the message has been successfully delivered. [ ... ] NFS-writing means (1) as usual, checking the number of bytes returned from each write() call; (2) calling fsync() and checking its return value; (3) calling close() and checking its return value. (Standard NFS implementations handle fsync() incorrectly but make up for it by abusing close().)
dovecot: an increasingly popular imap and pop server is dovecot, which handles mbox and maildir format with aplomb. It also handles virtual users quite well, including those existing only in databases.
courier: also popular.
cyrus: uses its own mailbox format; it is more formidable to configure than other imap setups.
What is imap/pop? These are protocols that allow a user to remotely retrieve email from a mailhost. imap (RFC 3501), unlike pop (RFC 1939), supports the idea of separate folders on the server machine, and it has more functionality built in. Generally, you leave your mail messages on an imap server, and you retrieve them from a pop server.
The main commands for POP are
IMAP commands are ``tagged''. This means that you need to put a short, unique identifier before you use a command; the response to that command will use the same tag. The main commands for IMAP checking are
There are two types of clients: (1) those that read email via a protocol such as IMAP, POP, or the ``Microsoft'' way, and (2) those that access mail via a filesystem.
Web clients: The very popular squirrelmail (http://www.squirrelmail.org) is an example of type (1) that uses IMAP. openwebmail (http://www.openwebmail.org) is an example of (2). It reads directly from either MBOX or Maildir format.
Dedicated interface clients: most of these now handle both file stores and IMAP/POP. Examples include Outlook, Thunderbird, Evolution, Sylpheed, Eudora, Pegasus, and a host of others.
Working on the latter setups can be interesting since the client can silently be going to entirely different machines also for its email.
I have worked on a setup where just determining where the client email was coming from required using tcpdump and lots of patience; in that case, a single user was having a problem accessing his mailbox: it turned out that the client interface (a very old version of a web email client) could not handle bad headers in email messages; it could not handle very large messages; and it was configured to terminate any handler that took longer than 30 seconds, so it could not ever handle a mailbox that had a large number of messages to move -- it used POP instead of IMAP, and thus ended up initially doing RETR, then DELE after it had pulled the messages into a maildir-like format.
An important web service is simple delivery of html over http (hypertext transfer protocol).
The current version of http in use is 1.1, defined in RFC 2616. (There was an early stab at an http 1.2, but it didn't jell.)
The most popular webserver is the Apache webserver, with an overall 46% market share according to Netcraft's current webserver survey, and powers 67% of the most active sites.
http://news.netcraft.com/archives/web_server_survey.htmlApache has two versions, 1.3 and the 2.x versions, but 1.3 is now considered a ``legacy'' system and Apache now recommends:
``Apache 1.3.41 is the current stable release of the Apache 1.3 family. We strongly recommend that users of all earlier versions, including 1.3 family release, upgrade to to the current 2.2 version as soon as possible.''http://www.apache.org/dist/httpd/Announcement1.3.html
There is another webserver survey that uses a somewhat different methodology than Netcraft at Security Space
http://www.securityspace.com/s_survey/data/200902/index.html}which you can view many different server statistics.
What does a typical conversation look like? Here's one request and answer for a page:
Hypertext Transfer Protocol GET /rfcs/rfc2612.html HTTP/1.1 Request Method: GET Request URI: /rfcs/rfc2612.html Request Version: HTTP/1.1 Host: www.faqs.org User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060202 Red Hat/1.7.12-1.1.3.4.centos3 Accept: text/xml,application/xml,application/xhtml+xml,text/html; q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Hypertext Transfer Protocol HTTP/1.1 200 OK Request Version: HTTP/1.1 Response Code: 200 Date: Thu, 23 Feb 2006 16:26:31 GMT Server: Apache Last-Modified: Thu, 23 Feb 2006 07:01:53 GMT ETag: "5f8977-910a-43fd5de1" Accept-Ranges: bytes Content-Length: 37130 Keep-Alive: timeout=5, max=100 Connection: Keep-Alive Content-Type: text/html Line-based text data: text/html < !DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> < HTML> < HEAD> < TITLE>RFC 2612 (rfc2612) - The CAST-256 Encryption Algorithm< /TITLE> < META name="description" content="RFC 2612 - The CAST-256 Encryption Algorithm"> < script language="JavaScript1.2"> function erfc(s) {document.write("< A href="/rfccomment.php?rfcnum="+s+"" target="_blank" onclick="window.open('/rfccomment.php?rfcnum="+s+"', 'Popup','toolbar=no,location=no,status=no,menubar=no,scrollbars=yes, resizable=yes,width=680,height=530,left=30 //--> < /script> < /HEAD> < BODY BGCOLOR="#ffffff" TEXT="#000000"> < P ALIGN=CENTER>< IMG SRC="/images/library.jpg" HEIGHT=62 WIDTH=150 BORDER="0" ALIGN="MIDDLE" ALT="">< /P> < H1 ALIGN=CENTER>RFC 2612 (RFC2612)< /H1> < P ALIGN=CENTER>Internet RFC/STD/FYI/BCP Archives< /P> < DIV ALIGN=CENTER>[ < a href="/rfcs/">RFC Index< /a> | < A HREF="/rfcs/rfcsearch.html"> RFC Search< /A> | < a href="/faqs/">Usenet FAQs< /a> | < a href="/contrib/">Web FAQs< /a> | < a href="/docs/">Documents< /a> | < a href="http://www.city-data.com/" < P> < STRONG>Alternate Formats:< /STRONG> < A HREF="/ftp/rfc/rfc2612.txt">rfc2612.txt< /A> | < A HREF="/ftp/rfc/pdf/rfc2612.txt.pdf">rfc2612.txt.pdf< /A>< /DIV> < p align=center>< script language="JavaScript">< !-- erfc("2612"); // -->< /script>< /p> < h3 align=center>RFC 2612 - The CAST-256 Encryption Algorithm< /h3> < HR SIZE=2 NOSHADE> < TT> Network Working Group C. Adams Request for Comments: 2612 J. Gilchrist Category: Informational Entrust Technologies June 1999 The CAST-256 Encryption Algorithm Status of this Memo
Here's a request and ``not modified'' answer for a page:
Hypertext Transfer Protocol GET /rfcs/rfc2616.html HTTP/1.1 Request Method: GET Request URI: /rfcs/rfc2616.html Request Version: HTTP/1.1 Host: www.faqs.org User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060202 Red Hat/1.7.12-1.1.3.4.centos3 Accept: text/xml,application/xml,application/xhtml+xml,text/html; q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://www.google.com/search?num=100&hl=en&lr=&q=http+protocol+rfc&btnG=Search If-Modified-Since: Thu, 23 Feb 2006 07:01:53 GMT If-None-Match: ``5f897b-63239-43fd5de1'' Cache-Control: max-age=0 Hypertext Transfer Protocol HTTP/1.1 304 Not Modified Request Version: HTTP/1.1 Response Code: 304 Date: Thu, 23 Feb 2006 16:11:36 GMT Server: Apache Connection: Keep-Alive Keep-Alive: timeout=5, max=100 ETag: "5f897b-63239-43fd5de1"
While in theory encoding allows for any type of arbitrary encoding of the body, http level encoding in practice is used to allow a server to optimize its use of bandwidth by optionally choosing when it would like to compress or gzip a body.
Chunking is almost the reverse: it instead embeds redundant information into the message body to let the client make decisions about buffering and early rendering of data. If chunking occurs, it is usually for dynamically generated data.
Where you put your configuration data varies widely; while /etc/httpd is certainly common, you also might see /etc/apache2 and other places. Also widely varying is where you might find your actual html files, the ``documentroot''. On Redhat machines, /var/www/html has been the default directory.
On OpenSuse, you would see /srv/www/htdocs.
The most important configuration file is httpd.conf