Lecture 17: Voting Protocol

These topics are from Chapter 13 (Fault Tolerance) in Advanced Concepts in OS

Topics for Today

voting algorithms

Voting Protocols

replicated data, at multiple sites
each site has some number of votes
access to replicated data requires a majority of votes
votes determine which version is the current one

Static Voting

replicated data, at multiple sites
each file access requires obtaining a lock
reader-writer locks are supported
every site has a lock manager
every file has a version number = number of changes made
each replica has some number of votes
vote allocation is on stable storage
reads and write require a quorum

Static Voting

For a read or write request initiated by site i:

issue Lock_Request to local lock manager
local lock manager eventually grants request and then sends Vote_Request to all sites

at site j:

on receipt of Vote_Request from i, issue Lock_Request to local lock manager
if the lock request is granted by the local lock manager, send the version number VN_j of its replica and the number of votes V_j of its replica to site i

at site i:

after votes are in, perform quorum test

Read Quorum Test

V_r =

å
k Î P

V_k ³ r

Where P is the set of sites that replied.

Write Quorum Test

V_w =

å
k Î Q

V_k ³ w

Where M = max{VN_j | j Î P} is the largest version number reported in the vote, and
Q = {j Î P | VN_j = M } includes only the votes that correspond to that version number

Voting Algorithm (continued)

at site i:

If quorum test fails, issue Release_Lock to local manager and all sites in P that returned positive votes
If quorum test succeeds, check whether local copy is current. If not, obtain a fresh copy from another site.
For a read, just use the local copy.
For a write, update the local copy.
Then update VN_i and send the updates and VN_i to all the sites in Q.
Issue a Release_Lock to local manager and all the sites in P

at other sites:

on receiving update messages, update own local copies
on Release_Lock, release all local locks

Vote Assignment

If v is the total number of votes, we want to choose r and w such that
r + w > v and
w > v/2

Why?

Consequences

no obsolete copies are updated due to a write operation
the current local replicas have at least w votes
every read quorum and write quorum overlap by at least one site
there cannot be simultaneous writes on distinct sets of replicas

How many temporary site failures can we tolerate?

What happens when a site comes back on line after failing?

What happens if the network is partitioned?

Tuning Example

Site	Votes	Read Access Time
1	1	75ms
2	1	750ms
3	2	750ms
4	1	100ms

If r=1 and w=5, the read access time is 75ms and the write access time is 750 ms. Any single site failure will prevent writes.

If r=3 and w=3, the access times are unchanged, but writes are still possible with a single site failure.

If site 4 is more reliable, we can further improve reliability by readjusting the votes as follows.

Site	Votes	Read Access Time
1	1	75ms
2	1	750ms
3	1	750ms
4	2	100ms

Dynamic Voting Protocols

Change the set of sites that can form a majority
Change the distribution of votes

Dynamic Vote Reassignment

number of votes per site changes
two kinds:

group consensus on new assignment
autonomous reassignment, ratified by majority of sites

What are the strengths and weakenesses of each?

Autonomous Vote Reassignment

each site i has vector V_i representing its belief of the global vote assignment
V_i[j] is how many votes i thinks j is entitled to have
each site i has version-number vector N_i
N_i[j] is the version number of V_i[j]
each site i has vector v_i representing the votes it has seen
v_i[j] is how many votes i sees j is trying to cast

Vote Increasing Protocol

When site i wants to increase V_i[i]:

send V_i and N_i along with new vote value x to all communicating sites
wait for a majority of sites to respond
if a majority is collected, update V_i[i] to the new value and increment N_i[i].

When site j receives a vote-increasing request from site i with V_i, N_i, and x:

V_j[i] = x
N_j[i] = N_i[i] + 1

Vote Decreasing Protocol

When site i wants to decrease V_i[i]:

set V_i[i] to the new value
increment N_i[i]
send V_i and N_i to the other sites

When site j receives a vote-decreasing request from site i with V_i and N_i:

V_j[i] = V_i[i]
N_j[i] = N_i[i]

Vote Collecting Protocol

for each reply V_j and N_j received by site i:
- v_i[j] = V_j[j]
- if V_j[j] > V_i[j] or (V_j[j] < V_i[j] and N_j[j] > N_i[j]) then
  V_i[j] = V_j[j]; N_i[j] = N_j[j]
  end if;
if site j did not respond to site i:
- Find k Î G such that N_k[j] = max {N_p[j] | p Î G}, where G is the set of all sites that replied to i. That is, find the site that has the latest information on the votes assigned to site j.
- v_i[j] = V_k[j]; V_i[j] = V_k[j]; N_i[j] = N_k[j]

Deciding the Outcome

Let K be the set of all sites, and G be the set of sites that responded to the ballot.

TOT =

å
k Î K

v_i[k]

RCVD =

å
k Î G

v_i[k]

Site i has a majority iff RCVD > TOT/2.

Vote Increasing Policies

The above all leaves open the question of when a site should try to increase or decrease its vote.

This is normally done in response to detection of an apparent failure.

overthrow technique -- one site increases its vote
alliance technique -- all active sites increase their votes

Lecture 17: Voting Protocol These topics are from Chapter 13 (Fault Tolerance) in Advanced Concepts in OS

Lecture 17: Voting Protocol

Topics for Today

Voting Protocols

Static Voting

Static Voting

Read Quorum Test

Write Quorum Test

Voting Algorithm (continued)

Vote Assignment

Consequences

Tuning Example

Dynamic Voting Protocols

Dynamic Vote Reassignment

Autonomous Vote Reassignment

Vote Increasing Protocol

Vote Decreasing Protocol

Vote Collecting Protocol

Deciding the Outcome

Vote Increasing Policies

Lecture 17: Voting Protocol

These topics are from Chapter 13 (Fault Tolerance) in Advanced Concepts in OS