Lecture 011 - Replication

Distributed Replication

To deal with failure, we sometimes replicate services for efficiency and reliability.

Read-only replicates is easy, but read-write will be tricky (harder to achieve consistency).

• strict consistency: everything happen at te same time with no latency (too perfect to imagine)

• sequential consistency: all nodes do operation in some global order

• causal consistency: when one write is "concurrent" (same as lamport clock idea) with other writes, then we don't see them as causally related, and therefore we don't constraint a order of those two events.

What to replicate:

• Only tell you there is an update, but don't tell you what it is.

• Any COMMAND (active replication, or state machine replication: deterministic log replay)

• Any data changes associated with COMMAND (useful when most operation is read)

When to replicate: Push vs Pull

• Pull: good for mostly read-only data

• Push: stateful

Primary-backup Replication Model

We assume: - there is a manager allowing replica nodes to join/leave - fail-stop (not Byzantine) failure model - assume we can detect failure - assume delay and message lost - servers don't lie, saying that it has already complete something when it is not - we have a failure detector, but it has latency

Remote Write Protocol:

• writes: always go to primary server, executed sequentially (write is blocking for consistency)

• read: may go to any replica, might not update realtime

Failure Handle:

• Replicas failure: can read from other replicas

• Primary server failure: rollover to backup

Asynchronous Replication: If you don't care about maintaining sequential consistency, you can reply to client before reaching agreement with backups (sometimes called "asynchronous replication").

Consensus Replication Model

Advantage: - fast response time even under failures - no master, operate as long as majority of machines is still alive - to handle $f$ failure, we must have $2f + 1$ replicas - replicated-write // QUESTION Also, for replicated-write => write to all replicas not just one• Paxos from Leslie Lamport is a famous protocol

Fischer-Lynch-Paterson Impossibility Result: No deterministic algorithm (under asynchronous communication) exist that will guarantee reaching to consensus in bounded amount of time for all runs. (In practice, network delay is random)

Paxos Consensus Algorithm

To create a replicated state machine, we only need to have a consistent replicated log (command input in the same order). It is the job of consensus algorithm to make replicated log consistent.

Basic Paxos

Problem: pick a single value once out of all proposed values.

Requirements:

• Safety: choose exactly one value that was proposed

• Liveness: proposed value is eventually chosen and learned accross all servers

Each machine consists of two parts

• Proposers: propose a value to acceptors

• Acceptors:

• respond to proposer's message vote on a value
• learn about which value was chosen

Terminology:

• accepted: vote "yes" by acceptors

• chosen: majority vote "yes" by acceptors

• single proposers: can't handle proposer crash

• always accept first seen value:

• everybody accept some value, no consensus
• solution: we need more than one round to reach consensus
• always accept every value

• will end up accepting multiple values
• solution: we need to place some total order on proposals

ProposalSN: a total order on proposal

• Server ID: lower bits

• Round Number: higher bits (every server track its own round number)

Phrases:

• Prepre Phrase

• Proposer: choose a proposal number n and a value v
• Proposer: broadcast PREPARE(n) to all acceptors
• Acceptors: record n if greater then record it, since it only want to accept latest proposal. response with accepted proposal with value (n_acc, v_acc)
• Proposer: Wait until majority responded. Replace v = max_{n_acc}(v_acc) if exist. (since v_acc is possibly be chosen already)
• Accept Phrase

• Proposer: propose (n, v) to all acceptors
• Acceptors: accept if n > n', reject and return current maximum proposal id n_acc.
• Proposer: if rejected, try again with greater number

Notice:

• prepare with higher n value cut off any acceptance

• if greater PREPARE and ACCEPT sandwitch a ACCEPT, then the greater is chosen

• if smaller PREPARE and ACCEPT sandwitch a PREPARE, then the smaller is not chosen

Multi-Paxos

Problem: combine several instances of basic Paxos to agree on a series of values, creating the log

Table of Content