Lecture 021 - Byzantine

The Problem: Several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. After observing the enemy, they must decide upon a common plan of action. Some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.

commander: send command

lieutenants: listen and act to command truthfully

Goal: All loyal generals decide upon the same plan of action. A small number of traitors cannot cause the loyal generals to adopt a bad plan. (Each nonfaulty process learn the true values sent by each of the nonfaulty processes)

Byzantine Failure: adversarial

fail-stop: no adversarial

Example Paxos under Byzantine Faults: with 3 servers, the malicious server always respond with ACCEPT, causing other servers to commit on different result.

Example Quorums under Byzantine Faults: (with fail-stop, we only need $2f+1$ nodes, ie can tolerate about half, since Paxos relies on overlapping node) - if intersection happens to be occupied by Byzantine node, two Quorums cannot communicate, so we need at least $f+1$ nodes - for liveness, Quorum size is at most $N - f$ , otherwise non-Quorum node can always lie to nodes in Quorum, causing them to not agree. - so for Quorum size of $N-f$ , overlap region can be calculated with $(N-f) + (N-f) - N$ . We want this to be $\geq f + 1$ as stated above, giving $N \geq 3f + 1$

Impossibility: No solution with fewer than $3f + 1$ generals can cope with $f$ traitors.

Can Agreement achieved in Faulty Systems? Red block indicate typical systems. Agreement can be reached if there is an klzzwxh:0003 — Can Agreement achieved in Faulty Systems? Red block indicate typical systems. Agreement can be reached if there is an $X$

Byzantine Fault Tolerance (Lamport)

Byzantine Agreement Assumption:

ordered message
with bounded communication delay
no lost package
synchronous
unicast
known sender

Problem: Byzantine voting problem.

Intuition: if process $A$ lies about his vote, then we can define process $A$ 's vote as what other processes $B, C, D$ hears from $A$ .

Algorithm:

every process broadcast their vote
every process broadcast what they hear from every other processes
first majority voting to decide what each server actually voted for
then majority voting to decide the voting result

Assume P3 is malicious, P1 can create the following table by collecting information from P2, P3, P4. x in the entry is malicious data (where original data is 1, and - represent N/A since P1 does not hear from itself)

Hear From \ Vote	P1	P2	P3	P4
P1	-	-	-	-
P2	1	1	x	1
P3	x	x	x	x
P4	1	1	x	1

Summing along the columns, we know it is likely P1, P2, P4 all voted for 1 and P3 voted for x. Then, from this result, we know 1 is the majority.

Async. Practical Byzantine Fault (Liskov)

Problem: correctly replicate a opcode in a system of Replicated State Machine (RSM). An example will be Ethereum Classic.

[CS198.2x Week 1] Practical Byzantine Fault Tolerance

Practical Byzantine Fault Assumption:

assume relatively small message delay
only a small fraction of nodes are Byzantine
Static configuration ( $3f+1$ nodes would not leave or join)
Primary-Backup Replication + Quorums
3 phrase protocol to agree on sequence number (to deal with malicious primary)
big quorum size $2f+1$ out of $3f+1$ nodes (to deal with loss of agreement)
authenticate communications (public key signatures, MACs)

Replica Stores:

replicaID
viewNumber (Primary = viewNumber % N)
[]log of <opcode, sequenceNumber, {PRE-PREPARED, PREPARED, COMMITTED}>

Algorithm: using a Replicated State Machine (RSM) with $3f+1$ replicas

client send opcode to Primary
Primary broadcast Pre-prepare<viewNumber, sequenceNumber, opcode> (and put it in log)
Replicas receive broadcast, determine if Pre-prepare<> is valid, if so, broadcast Prepare<replicaID, viewNumber, sequenceNumber, opcode> (and put it in log), wait until $2f+1$ Prepare<> from other replicas doing the same thing
1. crypto signature is valid
2. having the same viewNumber in message compared to stored
3. has not accepted other Pre-prepare<> with the same viewNumber
4. has not accepted the same sequenceNumber
5. Above ensure if <opcode1, viewNumber, sequenceNumber, replicaID1> in []log, then there is no <opcode2, viewNumber, sequenceNumber, replicaID2> in []log
6. Above ensure: All honest nodes that are prepared have the same opcode
7. Above ensure: At least $f+1$ honest nodes have sent Prepare<> and Pre-prepare<>
Replicas received $2f+1$ Prepare<>, send Commit<replicaID, viewNumber, sequenceNumber, opcode>, wait until $2f+1$ Commit<> from other replicas doing the same thing
Replicas received $2f+1$ Commit<> (if we assume viewNumber can change, then we only need $f+1$ ), (put it in log), send result to client
Client waits for $f+1$ matching replies before commit

Table of Content