To deal with failure, we sometimes replicate services for efficiency and reliability.
Read-Only: avaliability boost and performance boost (CDNs, server load)
Read-only replicates is easy, but read-write will be tricky (harder to achieve consistency).
strict consistency: everything happen at te same time with no latency (too perfect to imagine)
sequential consistency: all nodes do operation in some global order
causal consistency: when one write is "concurrent" (same as lamport clock idea) with other writes, then we don't see them as causally related, and therefore we don't constraint a order of those two events.
What to replicate:
Only tell you there is an update, but don't tell you what it is.
Any COMMAND
(active replication, or state machine replication: deterministic log replay)
Any data changes associated with COMMAND
(useful when most operation is read)
When to replicate: Push vs Pull
Pull: good for mostly read-only data
Push: stateful
We assume: - there is a manager allowing replica nodes to join/leave - fail-stop (not Byzantine) failure model - assume we can detect failure - assume delay and message lost - servers don't lie, saying that it has already complete something when it is not - we have a failure detector, but it has latency
Remote Write Protocol:
writes: always go to primary server, executed sequentially (write is blocking for consistency)
read: may go to any replica, might not update realtime
Failure Handle:
Replicas failure: can read from other replicas
Primary server failure: rollover to backup
Asynchronous Replication: If you don't care about maintaining sequential consistency, you can reply to client before reaching agreement with backups (sometimes called "asynchronous replication").
Advantage: - fast response time even under failures - no master, operate as long as majority of machines is still alive - to handle f failure, we must have 2f + 1 replicas - replicated-write // QUESTION Also, for replicated-write => write to all replicas not just one• Paxos from Leslie Lamport is a famous protocol
Fischer-Lynch-Paterson Impossibility Result: No deterministic algorithm (under asynchronous communication) exist that will guarantee reaching to consensus in bounded amount of time for all runs. (In practice, network delay is random)
To create a replicated state machine, we only need to have a consistent replicated log (command input in the same order). It is the job of consensus algorithm to make replicated log consistent.
Problem: pick a single value once out of all proposed values.
Requirements:
Safety: choose exactly one value that was proposed
Liveness: proposed value is eventually chosen and learned accross all servers
Each machine consists of two parts
Proposers: propose a value to acceptors
Acceptors:
Terminology:
accepted: vote "yes" by acceptors
chosen: majority vote "yes" by acceptors
Bad Approaches to Problems:
single proposers: can't handle proposer crash
always accept first seen value:
always accept every value
ProposalSN: a total order on proposal
Server ID: lower bits
Round Number: higher bits (every server track its own round number)
Phrases:
Prepre Phrase
n
and a value v
PREPARE(n)
to all acceptorsn
if greater then record it, since it only want to accept latest proposal. response with accepted
proposal with value (n_acc, v_acc)
v = max_{n_acc}(v_acc)
if exist. (since v_acc
is possibly be chosen already)Accept Phrase
(n, v)
to all acceptorsaccept
if n > n'
, reject
and return current maximum proposal id n_acc
.
Notice:
prepare with higher n
value cut off any acceptance
if greater PREPARE
and ACCEPT
sandwitch a ACCEPT
, then the greater is chosen
if smaller PREPARE
and ACCEPT
sandwitch a PREPARE
, then the smaller is not chosen
Problem: combine several instances of basic Paxos to agree on a series of values, creating the log
Table of Content