Lecture #2: Distributed Operating Systems: an introduction

Topics for today

Overview of major issues in distributed operating systems
Terminology
Communication models
Remote procedure calls

These topics are from Chapter 4 in the Advanced Concepts in OS text.

What is a distributed system?

It consists of multiple computers that do not share a memory.
Each Computer has its own memory and runs its own operating system.
The computers can communicate with each other through a communication network.
See Figure 4.1 for the architecture of a distributed system.

Why build a distributed system?

Microprocessors are getting more and more powerful.
A distributed system combines (and increases) the computing power of individual computer.
Some advantages include:
- Resource sharing
  (but not as easily as if on the same machine)
- Enhanced performance
  (but 2 machines are not as good as a single machine that is 2 times as fast)
- Improved reliability & availability
  (but probability of single failure increases, as does difficulty of recovery)
- Modular expandability
Distributed OS's have not been economically successful!!!

System models:

the minicomputer model (several minicomputers with each computer supporting multiple users and providing access to remote resources).
the workstation model (each user has a workstation, the system provides some common services, such as a distributed file system).
the processor pool model (the model allocates processor to a user according to the user's needs).

Where is the knowledge of distributed operating systems likely to be useful?

custom OS's for high performance computer systems
OS subsystems, like NFS, NIS
distributed ``middleware'' for large computations
distributed applications

Issues in Distributed Systems

the lack of global knowledge
naming
scalability
compatibility
process synchronization (requires global knowledge)
resource management (requires global knowledge)
security
fault tolerance, error recovery

Lack of Global Knowledge

Communication delays are at the core of the problem
Information may become false before it can be acted upon
these create some fundamental problems:
- no global clock -- scheduling based on fifo queue?
- no global state -- what is the state of a task? What is a correct program?

Naming

named objects: computers, users, files, printers, services
namespace must be large
unique (or at least unambiguous) names are needed
logical to physical mapping needed
mapping must be changeable, expandable, reliable, fast

Scalability

How large is the system designed for?
How does increasing number of hosts affect overhead?
broadcasting primitives, directories stored at every computer -- these design options will not work for large systems.

Compatibility

Binary level: same architecture (object code)
Execution level: same source code can be compiled and executed (source code).
Protocol level: only requires all system components to support a common set of protocols.

Process synchronization

test-and-set instruction won't work.
Need all new synchronization mechanisms for distributed systems.

Distributed Resource Management

Data migration: data are brought to the location that needs them.
- distributed filesystem (file migration)
- distributed shared memory (page migration)
Computation migration: the computation migrates to another location.
- remote procedure call: computation is done at the remote machine.
- processes migration: processes are transferred to other processors.

Security

Authetication: guaranteeing that an entity is what it claims to be.
Authorization: deciding what privileges an entity has and making only those privileges available.

Structuring

the monolithic kernel: one piece
the collective kernel structure: a collection of processes
object oriented: the services provided by the OS are implemented as a set of objects.
client-server: servers provide the services and clients use the services.

Communication Networks

WAN and LAN
traditional operating systems implement the TCP/IP protocol stack: host to network layer, IP layer, transport layer, application layer.
Most distributed operating systems are not concerned with the lower layer communication primitives.

Communication Models

message passing
remote procedure call (RPC)

Message Passing Primitives

Send (message, destination), Receive (source, buffer)
buffered vs. unbuffered
blocking vs. nonblocking
reliable vs. unreliable
synchronous vs. asynchronous

Example: Unix socket I/O primitives

#include <sys/socket.h>
ssize_t sendto(int socket, const void *message,
  size_t length, int flags,
  const struct sockaddr *dest_addr, size_t dest_len);
ssize_t recvfrom(int socket, void *buffer,
  size_t length, int flags, struct sockaddr *address,
  size_t *address_len);
int poll(struct pollfd fds[], nfds_t nfds,
  int timeout);
int select(int nfds, fd_set *readfds, fd_set *writefds,
  fd_set *errorfds, struct timeval *timeout);

You can find more information on these and other socket I/O operations in the Unix man pages.

RPC

With message passing, the application programmer must worry about many details:

parsing messages
pairing responses with request messages
converting between data representations
knowing the address of the remote machine/server
handling communication and system failures

RPC is introduced to help hide and automate these details.

RPC is based on a ``virtual'' procedure call model

client calls server, specifying operation and arguments
server executes operation, returning results

RPC Issues

Stubs (See Unix rpcgen tool, for example.)
- are automatically generated, e.g. by compiler
- do the ``dirty work'' of communication
Binding method
- server address may be looked up by service-name
- or port number may be looked up
Parameter and result passing
Error handling semantics

Lecture #2: Distributed Operating Systems: an introduction

Lecture #2: Distributed Operating Systems: an introduction

Topics for today

What is a distributed system?

Why build a distributed system?

System models:

Where is the knowledge of distributed operating systems likely to be useful?

Issues in Distributed Systems

Lack of Global Knowledge

Naming

Scalability

Compatibility

Process synchronization

Distributed Resource Management

Security

Structuring

Communication Networks

Communication Models

Message Passing Primitives

Example: Unix socket I/O primitives

RPC

RPC Issues

RPC Diagram