Lecture 19: Distributed Shared Memory
These topics are from Chapter 10 (Distributed Shared Memory) in Advanced
Concepts in OS
Topics for Today
- Distributed Shared Memory
Motivation
Two types of programming paradigms for developing distributed applications
- Explicit message passing
- Distributed shared memory: from the user point of view, referencing
a remote memory location is the same as referencing a local memory location.
Communication is done implicitly.
-
Some advantages of DSM:
- Programming is a lot easier
- No need to deal with communication details.
- Easy to handle complex data structures
- DSM systems are much cheaper than tightly coupled multiprocessor
systems (DSMs can be built over commodity components).
- DSM takes advantages of the memory reference locality -- data are moved
in the unit of pages.
- DSM can form a large physical memory.
- Programs for shared memory multiprocessors can easily be ported to DSMs
Challenges in DSM
- How to keep track of the location of remote data?
- How to overcome the communication delays and high
overhead associated with the references to remote data?
- How to allow "controlled" concurrent accesses to shared data?
Algorithms for implementing DSM
- The central-Server Algorithm
- A central-server maintains all the shared data.
- for read: the server just return the data
- for write: update the data and send acknowledge to the client
- a simple working solution to provide shared memory for distributed
applications
- low efficiency -- bottleneck at the server, long memory access latency
- Data can be distributed -- need a directory to store the location
of a page.
- The migration algorithm
- Data is shipped to the location of the data access request --
subsequent accesses are local
- For both read/write: get the remote page to the local machine, then
perform the operation.
- Keeping track of memory location: location service, home machine for
each page, broadcast.
- Problems: thrashing -- pages move between nodes frequently, false sharing
- Multiple reads can be costly.
- The Read-Replication Algorithm
- On read, a page is replicated (mark the page as multiple reader)
- On write, all copies except one must be updated or invalidated
- multiple read one write
- Allowing multiple readers to a page
- All the location must be kept track of: location service/home machine
- The Full Replication Algorithm
- Allow multiple read and multiple write
- Must control the access to shared memory
- Practical????
Memory Coherence:
- The set of allowable memory access orderings forms the memory consistency
model.
- A memory is coherent if the value returned by a read operation is always
the value that the programmer expected.
- Strict consistency model is typical in uniprocessor: a read returns
the most recent written value.
- it is very costly to enforce the strict consistency model in distributed
systems: how to determine last write?
- To improve performance, we need relax memory consistency model.
Relax memory consistency model
- sequential consistency: the result of any execution of the operations
of all the processors is the same as if they were executed in a
sequential order.
- General Consistency: All the copies of a memory location eventually
contain the same data when all the writes issues by every processor have
completed.
- Weak consistency: synchronization accesses are sequentially consistent.
All data access must be performed before each synchronization.
- Other consistency models: general consistency, processor consistency,
release consistency.
Coherence Protocols
- The needs to make the data replicas consistent
- Two types of basic protocols
- Write-Invalidate Protocol: a write to a shared data causes
the invalidation of all copies except one before the write.
- Write-Update Protocol: A write to a share data causes all copies
of that data to be updated.
- Case Study: Cache coherence in the PLUS system.
- Write update protocol
- General consistency
- Unit of replication: a page (4KB)
- Coherence maintenance in the unit of one word
- A virtual page is PLUS corresponds to a list of replicas, one of the
replica is the master copy. The locations of other replicas
are maintained through a distributed link list (copy list)(Figure 10.6)
- On a read fault: if local memory, read local memory. Otherwise, send
request a specified remote node and get the data
- For write: First update the master copy and then propagated to the
copies linked by the copy list. On a write fault, if the address indicates
a remote node, the update request is sent to the remote node. If the copy is
not the master copy, the update request is sent to the nod containing the
master copy for updating and then further propagation.
- write is nonblocking.
- read is blocked when all writes completes.
- write-fence is used to flush all previous writes.
Granularity
- Granularity: size of the shared memory unit
- the page size is usually a multiple of the size provided by the underlying
hardware and memory management system.
- large page size -- more locality, less communication overheads,
more contention, more false sharing
- separate the unit of replication and the unit for coherence maintenance.
Page replacement
- least recently used (LRU) may not be appropriate -- data can be accessed in
different mode: shared, private, read-only, writable, etc.
- replacement policy needs to take access modes into consideration.e.g.
private data should be replaced before shared data.
READ-only page can just be deleted.
Summary
- Distributed shared memory is an implementation of the shared memory
concept in distributed systems (no physically shared memory).
- Main goals of DSM: (1) to overcome the architectural limitations (memory
size) and (2) to support a better programming paradigm.
- The major challenge: high cost of communication.
- Some factors that affect the performance: granularity and replacement
algorithm