P4.html

COP4610: Operating Systems & Concurrent Programming

Programming Assignment #4
Multithreaded TCP Echo Server

Spring 2015

Please see the supplemental file P4_supplement.html after reading this file.

Overview:

You are to modify a simple TCP server program, to make it multithreaded. You will start with a single-threaded version that can handle only one clienty connection at a time, and modify it to allow multiple concurrent connections using a pool of threads. This will require you to use the POSIX system calls pthread_create() and pthread_join(), along with operations related to thread creation attributes, as well as the synchronization operations on a mutex and a condition variable (CV).

Deadline:

See course calendar.

Educational Objectives:

Learn to program a concurrent application, including synchronization and protection of critical sections, using POSIX threads.
Become familiar with application of the BSD socket abstraction, as it applies to TCP/IP servers.
Learn more about writing portable code, using configuration scripts and encapsulating system-dependencies in a configurion file.

Deliverables:

Modified version of multisrv.c, as described below.

Tasks:

Familiarize youself with the code in the directory ~baker/opsys/assign/P4/P4.tgz. This consists of:
- echosrv.c : a simple TCP/IP server, that accepts connections and echos back whatever text is sent to it.
- multisrv.c : initially, a copy of echosrv.c. You will modify it to support multiple concurrent connections, using a pool of threads. There are comments indicating where you should insert your new code, and which parts I believe will not need changing. However, you are free to make whatever changes you need to this file, so long as you do not modify any of the other files and the end product meets the behavior requirements.
- echocli.c : a simple TCP/IP client, that makes a connection and copies from standard input (e.g., a terminal) to the connection.
- checks.h : some macros that I found convenient for checking the return values of system calls. As noted in the comments, and evident in places where the provided code checks return values explicitly, they will not necessarily cover all cases that you encounter.
- echolib.h : functions used by the two programs above for input and output, which I hope will enable you to focus more on the thread programming issues, without having to code for and debug problems related partial reads, buffering, interrupted system calls, and disconnects.
- echolib.c : the implementation code for the functions declared in echolib.h
- Makefile : a Makefile to build the programs
- configure : a shell script that runs experiments on the system and creates the system-dependent files Config and config.h. Be sure to re-run this program and redo make every time you switch between systems!
Modify the program multisrv.c so that it is multithreaded. Your main objective is to allow the program to serve multiple connections concurrently. That is, it should be possible to start up one copy of the server, and have it serve multiple concurrent clients. Several clients should be able to be connected concurrently, and they should be able then to stay connected, all sending data to the server and having their data echoed.

You are required to implement this as a multi-threaded process using the POSIX threads API. The main thread of the program should listen for connections, and then pass off the connections to threads, one thread per connection. The threads should be allocated from a pool of server threads. The maximum number of concurrent connections will be limited to the number of server threads in the pool. The number of threads in the pool should default to four (4), but be able to be set via the first command line argument to any value between one (1) and ten (10).

By coincidence, the number of queued connections for the listening socket is set to 4. This will allow up to 4 clients to make connections to the server (and start to send, without receiving any response), beyond the number that it is currently serving. So, for example, if you have four threads, the server can have up to eight (8) connections: four that are actively interacting with the server, and four that are queued but receiving no response yet.

The server should be shut down by sending it a SIGINT signal (control-C from a terminal).
Test your modified program. Correct any bugs you find. Once you have the basic functionality working, try to stress it. For example, try many clients, clients that die while they are connected, etc.
Verify that it is portable. Compile and test on the other operating system (e.g., SunOS if you started with Linux). Correct any bugs, then go back and verify that your changes have not broken if for the other OS. Repeat until it works equally well on both OSs.

Additional Instructions:

Since this is your next-to-last assignment, you will need to write a bit more new code than on previous assignments. For example, my multi-threaded multisrv.c has a little over 100 additional lines of code. Also, fewer hints are given and you are expected to work out more of the details indivudally than for the preceding assignments.

Make certain you have read the relevant sections of your textbook before your start. See the list of sections under References below.

I suggest you make one initial pass through the code, to get the overall picture, before you start work on any modifications to multisrv.c. Then go back to read some parts in more detail when you discover that more detailed understanding is necessary. You will need to understand the overall function of echocli.c, since it will be the client for your server. You will need to understand what the functions in echolib.h and echolib.c do, because they are called by multisrv.c. You will not modify the Makefile or the configure script, and so you may be able to get your program working for this assignment without reading them carefully. However, I hope you will read and try to understand them, because they illustrate an important pattern for writing code that is portable across diverse systems, which you will find useful if you ever do professional software development. (You may also find that they are covered on exams.)

To get started, get a copy of the provided files, as follows:

mkdir P4
cd P4
tar -xzvpf ~baker/opsys/assign/P4/P4.tgz

Then run the configuration script, and make the programs, as follows:

./configure
make

Whenever you switch between operatng sytems, you will need to re-do the configuration, like this:

make realclean
./configure
make

You can start up and test the server like this, replacing "linprog3.cs.fsu.edu" by the name of the system on which you are running:

./multisrv&

It will output a line with the port number that it chose, like this:

server using port 35898

You can then run a client, like this:

SERVERHOST=linprog3.cs.fsu.edu
export SERVERHOST
./echocli 35898

The port number provided to the client must match the port number chosen by the server. You only need to set SERVERHOST once per session, but remember to export it. The above assumes you are using a Bourne-type shell (sh or bash). If you are using tcsh you will use setenv SERVERHOST linprog3.cs.fsu.edu, which includes the effect of setting and exporting in one command. Note also that you can run the client and server on different machines, in which case make certain you set SERVERHOST to the machine where the server is running, not the client. If you are doing everything on your local machine, you can set SERVERHOST to localhost, which is the conventional alias for the local host.

If it is working, whatever you type to the client should be echoed back, until you type control-D (end of file) to terminate the client. You should also be able to terminate it with control-C (SIGING). Here is a compelete example of an interaction from testing on my MacBook:

bub: SERVERHOST=localhost
bub: export SERVERHOST
bub: ./multisrv&g
[2] 6560
bub: server using port 61257
./echocli 61257
connection wait time = 2333 microseconds
kjjl;kl;jl
kjjl;kl;jl
klj;lj;kj;lj;lj;lkj
klj;lj;kj;lj;lj;lkj
jjljk
jjljk
bub:

Note that at this point echocli has terminated, but multisrv is still running. Don't forget to kill multisrv (or echosrv) before you quit, and when debugging remember to always kill all executing copies of these servers before you test a recompiled version. An easy way to do this is pkill -KILL srv. Another way is to bring the process back into the forground, using fg, and then terminate it with control-C (SIGINT). However, if you do this try it more than once, in case you have more than one copy of the server running in the background.

You may try opening additional windows to run additional clients, up to (and above) the limit set in the command-line parameter of echoserv.

This mode of testing should be sufficient for you to debug your code, though you may wish to write some scripts to automate the repeated steps.

I have provided a test driver program (driver.c) and a script (testone) that are similar to one we will using in grading. It tests multisrv using a variable number of concurrent clients. However, I do not recommend using that driver until you have first done some basic testing and debugging working directly from the shell on simpler examples. In particuar, if your server contains debugging trace output code the output from the multiple client processes can be confusing. The output from each client process goes to a separate pipe and is printed out separately at the end, so it will not appear in global chronological order if you are trying to match agsinst the server's trace output.

The script testone makes use of another program timer.c to manage execution of multisrv and keep track of its resource usage. This information can be useful in detecting some kinds of errors. However, Linux and Solaris each only support a subset of the resources, so some counts will come out a zeros even though a non-trivial amount of the resource was used.

Don't forget to comment your changes, including at least your name, the date of the changes, and the purpose of the changes.

There are two reasonable patterns for thread management in a multi-threaded server application:

new-thread-per-connection: In this model a new thread is created each time the listener accepts a new connection. The thread terminates when the connection is closed. This model is dangerous, since unbounded growth of connections leades to unbounded growth of threads, and that leads to system or application failure.
pool-of-threads: In this model there is a fixed-size pool of threads, which are reused (don't terminate until the server shuts down). New connections are allocated to threads in this pool as the connections are accepted. When all the threads are busy, no more connections are allowed until a thread becomes free. This model is better, since it responds to heavy load in a more controlled and predictable way. The idea is similar to the free-list of available storage nodes, which you saw applied in the simulator on which you worked for assignment P1, but in this case we have a list of free server threads.

For this assignment you are to implement the pool-of-thread model, with a fixed number of threads in the pool, as specified under Tasks above. There are two reasonable ways to create the pool of threads:

Create all the threads during program initialization, before accepting any connections. When a connection is made, assign it to one of the available threads. Pause accepting connections when all threads are busy serving connections. When a connection is closed, return the thread to the pool, and resume accepting connections.
Create new threads as needed, up to the specified maximum. When a new connection is made, first look in the pool and assign it to one of the available threads if one is available. Otherwise, check whether the specified maximum has been reached, and create a new thread if it has not been reached; otherwise, suspend accepting connections as above. The actions when a connection is closed are the same as the first case.

Model (i) is probably simpler if you choose to design and implement a full at once, but model (ii) may be easier if you decide to follow the incremental approach described below.

When I worked out the trial solution I used to separate out the parts of the program you need to change, and for my estimate of 100 lines of new code, I followed model (1).

However, students who are entirely new to threads and/or sockets, or who have had difficulty with the preceding assignments, may benefit from approaching this in an incremental fashion, making changes to the given bare-bones program in phases. This will mean more re-coding, since code you introduce in one phase may need to be thrown out and replaced when you move to the next phase. On the other hand, it will allow you to focus on understanding and debugging one aspect at a time. Also, if you do not finish everything you will still have a compilable and partially working solution to turn in. The following is one way the project can be divided into phases.

Make the additions necessary to implement the new-thread-per-connection model. Ignore the maximum specified by argv[1]. To to this you will need to understand the parts of the Pthreads API that deal with thread creation and termination, but you will not need to learn to use mutexes and CVs, since there will be no sharing of variables or synchronization between the server threads. Save your working version in case you run out of time on the next phase.
Implement a mechanism for limiting the number of active threads to the value specified by command-line parameter argv[1], with a default value of four (4) threads. To do this, you will need to implement a counter for the number of threads are currently active (or, conversely, the remaining number that you can create without exceeding the maximum). The counter could be similar to a counting semaphore. You can design it to count up or count down. Suppose you choose to count down. You would initialize the counter to the specified maximum. When the listening (main) thread gets ready to accept a connection, it checks that the counter is non-zero, and suspends itself (waits on a CV) if the counter is zero. Otherwise, when the counter is positive, it decrements the counter and creates a new thread to accept the connection. When a connection is closed the server thread should increment the counter, and signal the CV that the listener threads waits on when it finds the counter has reached zero. You will need a mutex to protect the counter, since it is accessed by multiple threads. See the examples of implementing a counting semaphore using a mutex and CV, and the general discussion on coding design-level monitors using mutexes and CVs for additional information. Again, if you get this working, save a copy in case you do not complete the next phase. You will need to throw out this semaphore/counter and replace it by something more complicated when you move to the next phase.
Add a data structure to keep a list (e.g., queue or stack) of available threads, and modify your existing code to take a server thread from this list if one is available, before creating a new thread. If it takes a server from the list, the listener thread will wake the server thread up, by signaling a CV on which the idle thread is waiting. When connection is closed, instead of exiting (terminating) the server thread inserts itself into the list of available servers, and suspends itself on the CV. This may be viewed as a monitor.
There are many reasonable ways of working out the details. An important set of decisions you need to make is your choice of data structures, which will then have an effect on how many locks and CVs you need. Thinks about the operations you want to implement. As mentioned above, you will want to maintain a dynamic set of available servers. In addition, you will also need a way of iterating through all of the existing server threads, in order to do a phread_join() call when the server shuts down.
There are many data structures to choose from. The sets could be represented using an array, or a linked list. It could be singly or doubly linked, or you could even use the list.h doubly-linked circular list.

Depending on the structure you choose, you may need fewer or more CVs. The critical issue here is that if more than one thread is waiting on a CV there is no way to predict which threads will wake up; it may be any one, several, or even all of them. So, if your data structure for the set of available server threads requires that you delete from one end (say a singly-linked stack or queue) you will need one CV per thread or per list node, to wake up just that one specific thread. On the other hand, if your structure allows efficent deletion of arbitrary elements (e.g., and array or a Linux-style list) you can use a singel CV, and have the first thread to wake up delete itself from the set of available servers.

There so many options for the data structures, that I hope you will exercise your creative freedom on this aspect of the project to design one that is both simple and efficient for the purpose. Here are just a few examples, to illusrate the range of possibilities. (a) You might keep one static list or array of all server threads, and use a flag to indicate which is available; finding a free server would then be an O(n) operation, and the maximum number of threads would be fixed, but freeing a server would still be O(1) and you would have only one structure. (b) You could maintain two linked lists: one of available servers, and one of all servers. (c) You could maintain a linked list of busy servers and a list of available servers, and move the servers between the two lists; when you shut down the server you could run through each list, doing pthread_join() with each of the threads in each list. There are many other possibilities.

Adding threads to the pool on demand will take a bit more code than if create the wholepool of threads all at once, before accepting connections. The difference is one less case to deal with when the listener thread looks for a server thread. This is a trade-off you will have to decide.

For testing, it will save you a lot of time in the long run if you develop your own scripts to speed up the repeated commands involved in testing. You can create a script for each scenario you want to test, and take advantage of parameters for variations, like the number of clients.

Take care to shut down all the threads when the program stops, and especially before you log out. In theory (according to the POSIX/Xopen standard) all threads should be terminated when the main program terminates. However the last time I checked (I have not checked this year) Linux did not yet implement this correctly. Orphan threads would be left if the main process thread was killed or exited before terminating and collecting all the threads with pthread_join(). I have made provision for orderly termination, in the form of the shutting_down flag and a handler for SIGINT, but until you have this aspect working you may find that you have such orphan threads when you stop your server. Check for this using ps -u to look for active threads after each test run, and before you log out. Use kill (e.g., kill -KILL <processid>) or pkill (e.g., pkill -KILL multisrv for the echo-server) to kill them, as necessary.

In order for the above to work, I recommend that you not log into the programming servers via the aliases "linprog" and "program", but rather pick one of the servers and log into it directly, e.g. "linprog1.cs.fsu.edu", "linprog2.cs.fsu.edu", etc. The reasons: You may want to open multiple windows, and you want them to be open on the same machine. When you check for runaway processes or threads, you need to be on the same machine where they are running. The aliases will log you into a random machine, which may not be the one where your processes/threads are running.

Besides deadlock and race conditions due to unprotected critical sections, some considerations often overlooked by students include:

Providing for orderly shutdown of the server, including all its threads.
Handling lost connections (e.g., when a client dies) in a way that does not cause leakage of threads or storage.

Common failures include servers that cannot shut down, servers that lock up after all the threads in the server pool have enountered errors (and failed to recover from the errors), and servers that keep generating new threads without recovering the old ones (eventually hitting a system limit on memory or the number of active threads).

Testing concurrent programs is difficult. You may get "lucky" when running a test, and it may seem your program is working even though you are actually unlucky in the sense you have failed to detect errors in your synchronization code. Those errors may show up later when we test it, or we may just notice them when we read the code. Here are some examples of typical errors that we will notice if we read the code, but might not show up in testing:

Failure to protect a critical section using mutexes. Failures include not only missing calls to mutex lock/unlock functions, but sometimes splitting up a single critical section into pieces that still permit unsafe interleavings.
Exceptional flows of control that allow failure to release a mutex lock, under unusual or error conditions.
Use of pthread_cond_wait() that does not allow for the possibility of spurious (early) wakeup, i.e., without testing for the corresponding logical condition upon wakeup, and looping back to repeat the cond-wait call if the condition has not been satisfied.

If you try using gdb or any other debugger, beware that it behaves strangely in a multi-threaded environment. For example, if you interrupt an executing program, you may end up in any one of the threads. Single-stepping also works strangely. If you want to use a debugger, read up on the tricks of using it from some of the articles on the Web. Personally, I have found it sufficiently frustrating that I mostly rely on a combination of assertions and trace outputs to stderr. However, you need to be careful about even trace outputs. Since the various threads will all be writing to the same stream, you will lose some output and find the outputs you get are interleaved in a way that is very confusing. This all puts a premium on reading your code carefully and getting it right.

There is no really good way to collect a unified trace log from a multi-threaded or multi-process application. You have three choices:

Let the output interleave. This is a pretty bad solution, but easy. Be careful not to believe what you see. Allow for interleaved, garbage, and lost output due to one thread overwriting the buffered I/O of another.
Provide some form of safe locking protocol, that allows only one thread to write to the log at a time. This is actually a bad idea, since it forces more serialization between the threads than would otherwise be necessary. If you leave the logging in, it slows down the program, resulting in poor performance. If you take it out, new race conditions are likely to pop up, which will disappear again if you turn the logging back on.
Keep separate logs, one for each thread, and re-assemble them after the program completes. This is analogous to what I do in driver., with multiple processes. It is probably the best solution, but it can be quite cumbersome. In order to allow later integration of the logs, you want to add timestamps to the output. The timestamps need to be fine-enoug grained, for example like the Intel x86 hardware clock-cycle counter, if you can count on it being the same across all cores.

Getting concurrent code right through test-and-fix is generally not practical. You need to rely mainly on careful design and proof-reading of your code. The KISS ("keep it simple, stupid!") principle applies. Design your code to be simple, and keep looking for ways to simplify it further. Divide the work up into chunks that can be done by functions/methods that are short and simple enough that you can understand them at a glance - certainly no longer than will allow you to see the entire function in one editor window, preferably a lot shorter than that.

References:

Section 3.6.1 of your textbook, on sockets in the context of client-server systems.
Section 5.8 of your textbook, for the monitor concept and use of mutexes to implement a monitor.
Section 5.9.4 of your textbook, on Pthreads synchronization.
Chapter 4 of your textbook, for a general introduction to multi-threaded programming ocncepts.
Chapters 5 & 6 of your textbook, for information on mutual exclusion and synchronization in general, and POSIX threads in specific.
My Supplemental Notes under the heading "Supplemental for Concurrent Programming & Synchronization", including the example code. I especially recommend the following:
- POSIX Threads API - most important
- Semaphores, versus Mutexes & CVs - for the example of implementing a monitor using mutexes and CVs
- Mutual Exclusion & POSIX Mutexes - for detailed explanation of Pthread mutexes
Xopen.org standard POSIX man-pages

Delivery Method:

First read the Study Guide section on Submitting Assignments, for general instructions.

Log into shell.cs.fsu.edu.
cd to the directory where you have your working solution, and make certain all of the files are in that directory.
Execute the shell script submitp4.sh. If it works correctly, you should see a message indicating the files that were sent, and receive the usual two e-mail confirmations.

Assessment:

Start by reading the Grading Criteria for Programming Assignments section of the Study Guide for general grading criteria.

Your program will be evaluated primarily for correct usage of the POSIX thread constructs and how well it accomplishes the objective of providing efficient concurrent service for multiple clients. Solutions that make use of a pool of threads created in advance will be considered superior to solutions that create and destroy a new thread for each connection.

Grading will be according to the following performance rubric, applied to the results of testing on both operating systems. However, the number of points shown on each line is a maximum; you can expect to have your score reduced if the code that implements that feature does not adhere to the instructions given in this file, or only compiles on one of the two operating systems.

The rubric below may be modified, after discussion with our grader, Danielle Anderson.

Criterion	Points
Program is multithreaded and handles at least 4 concurrent connections correctly. This requires correct uses of pthread_create() and pthread_join(). Please do not use detached threads. I know that this can work as an alternative, but want you to verify that each thread has completed at the end, using pthread_join().	35
In addition, it correctly limits the number of concurrent threads (and connections) to the value specified by the command line parameter. Attempts to connect above this limit should block until a server thread becomes available. Both Pthread mutexes and at least one condition variables must be used correctly to protect all critical sections involving variables that may be accessed concurrently by more than one thread. (This includes freedom of deadlock and race conditions.)	35
In addition, it makes use of a fixed pool of threads that are re-used across multiple connections, whether all created at program initialization time, or created on demand until the specified limit is reached. The same concerns about correct use of mutexes and condition variables apply.	15
Everything compiles and works correctly on SunOS as well as Linux	5
Appropriate comments and coding style	10

Points may be deducted from any or all of the above, even if a program passes tests, if there are defects visible upon examination of the code. Examples of such possible defects include unprotected critical sections, incorrect usage of system calls, leakage of memory or other resources (file descriptors, process ids, thread ids, etc.), and redundant code.

Submissions that come in past the deadline will have 5 points deducted for each day they are late. No submissions will be accepted after grading has started.

Addendum: Frequent Pthread programming errors

The following are serious errors are frequenly encountered when grading programs using threads. They likely to go undetectedt in testing, but which are likely to be caught we "eyeball" the code and which will incur heavy penalties.

Unprotected critical sections: Suppose you are implementing a semaphore-like counter, c. You need to protect both the increment c++ and the decrement c-- operations and any code that checks the value of the counter must also be protected.
Busy-wait loops: You textbook talks about spinlocks, and other forms of busy-waiting (i.e., looping and checking some condition until it changes from false to true, or vice versa). This kind of looping can lead to deadlock on a single processor, since it keeps the processor busy and prevents the other threads that would change the condition and break you out of the loop from executing. On a multicore (like the "linprog" and "program" systems), it is still bad, since it ties up a processor doing nothing but spinning. This is why pthread_cond_wait() is provided. It allows a thread o block (give up the processor) until it is signaled to recheck the condition it is waiting for.
Failing to recheck the logical condition (loop) with pthread_cond_wait(): While pthread_cond_wait() will suspend the thread, the standard allows for the possibility of "spurious" wake-ups, i.e., situations other than a call to pthread_cond_signal() might cause the thread to wake up early, or to wake up more than one thread that is waiting on that CV. Note that the return from pthread_cond_wait() involves an implici relocking of the mutex M. If more than one thread wakes up, the first thread to get the mutex is likely to change the state of the logical condition you were waiting for, possibly setting back to false. So, you must alway recheck the condition. For example, see the following:
void semaphore_wait () { pthread_mutex_lock (&S.M); while (S.value == 0) pthread_cond_wait (&S.CV, &S.M); S.value--; pthread_mutex_unlock (&S.M); }
Nothing above can be omitted or re-ordered. The while-loop in the semaphore wait is essential. The test (S.value == 0) is the logical condition the thread is waiting for.
void semaphore_signal () { pthread_mutex_lock (&S.M); S.value++ ; if (S.value == 1) pthread_cond_signal (&S.CV); pthread_mutex_unlock (&S.M); }
In the matching operation, above, there is only one thing that could be different: the call to pthread_mutex_unlock could be done before the call to pthread_cond_signal(), but then you would need to take care not to move the logical condition test (S.value == 1) outside the protection of the mutex. The following is one way this could be done.
void semaphore_signal () { pthread_mutex_lock (&S.M); S.value++ ; pthread_mutex_unlock (&S.M); pthread_cond_signal (&S.CV); }
However, this results in unnecessary signaling, and spurious wakeups. The following eliminates this problem, at the expense of a bit more code.
void semaphore_signal () { pthread_mutex_lock (&S.M); S.value++ ; if (S.value == 1) { pthread_mutex_unlock (&S.M); pthread_cond_signal (&S.CV); } else { pthread_mutex_unlock (&S.M); } }
This could be the best solution in cases with multiple processors, since it may reduce unnecessary context switches caused by a waiting thread waking up from pthread_cond_wait() before the lock is released, then having to block and yield its processor to another thread in order for the signaling thread to unlock the mutex.

Failing to release a lock through exceptional flow of control, e.g. some case that is unlikely to come up in your testing:
pthread_mutex_lock (&M); ... if (...) return; pthread_mutex_unlock (&M);

Calling another blocking system call while holding a mutex, e.g.
pthread_mutex_lock (&M); ... some blocking system call, like connect(), read(), listen(), etc. ... pthread_mutex_unlock (&M); ...

Deadlock involving inconsistent double-locking, e.g.
pthread_mutex_lock (&M1); ... pthread_mutex_lock (&M2); ... pthread_mutex_unlock (&M2); ... pthread_mutex_unlock (&M1); ... pthread_mutex_lock (&M2); ... pthread_mutex_lock (&M1); ... pthread_mutex_unlock (&M2); ... pthread_mutex_unlock (&M1);

Using a single CV with multiple mutexes, e.g,
pthread_mutex_lock (&M1); while (...) pthread_cond_wait (&CV, &M2); ... pthread_mutex_unlock (&M1);
or
pthread_mutex_lock (&M1); while (...) pthread_cond_wait (&CV, &M1); ... pthread_mutex_unlock (&M1); ... pthread_mutex_lock (&M2); ... pthread_cond_signal (&CV); pthread_mutex_unlock (&M2);

The above list certainly not complete.

T. P. Baker.

Programming Assignment #4 Multithreaded TCP Echo Server