The Problem: Several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. After observing the enemy, they must decide upon a common plan of action. Some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.
commander: send command
lieutenants: listen and act to command truthfully
Goal: All loyal generals decide upon the same plan of action. A small number of traitors cannot cause the loyal generals to adopt a bad plan. (Each nonfaulty process learn the true values sent by each of the nonfaulty processes)
Byzantine Failure: adversarial
fail-stop: no adversarial
Example Paxos under Byzantine Faults: with 3 servers, the malicious server always respond with ACCEPT, causing other servers to commit on different result.
Example Quorums under Byzantine Faults: (with fail-stop, we only need 2f+1 nodes, ie can tolerate about half, since Paxos relies on overlapping node) - if intersection happens to be occupied by Byzantine node, two Quorums cannot communicate, so we need at least f+1 nodes - for liveness, Quorum size is at most N - f, otherwise non-Quorum node can always lie to nodes in Quorum, causing them to not agree. - so for Quorum size of N-f, overlap region can be calculated with (N-f) + (N-f) - N. We want this to be \geq f + 1 as stated above, giving N \geq 3f + 1
Impossibility: No solution with fewer than 3f + 1 generals can cope with f traitors.
Byzantine Agreement Assumption:
ordered message
with bounded communication delay
no lost package
synchronous
unicast
known sender
Problem: Byzantine voting problem.
Intuition: if process A lies about his vote, then we can define process A's vote as what other processes B, C, D hears from A.
Algorithm:
Assume P3 is malicious, P1 can create the following table by collecting information from P2, P3, P4. x
in the entry is malicious data (where original data is 1
, and -
represent N/A
since P1 does not hear from itself)
Hear From \ Vote | P1 | P2 | P3 | P4 |
---|---|---|---|---|
P1 | - | - | - | - |
P2 | 1 | 1 | x | 1 |
P3 | x | x | x | x |
P4 | 1 | 1 | x | 1 |
Summing along the columns, we know it is likely P1, P2, P4 all voted for 1
and P3 voted for x
. Then, from this result, we know 1
is the majority.
Problem: correctly replicate a opcode
in a system of Replicated State Machine (RSM). An example will be Ethereum Classic.
[CS198.2x Week 1] Practical Byzantine Fault Tolerance
Practical Byzantine Fault Assumption:
assume relatively small message delay
only a small fraction of nodes are Byzantine
Static configuration (3f+1 nodes would not leave or join)
Primary-Backup Replication + Quorums
3 phrase protocol to agree on sequence number (to deal with malicious primary)
big quorum size 2f+1 out of 3f+1 nodes (to deal with loss of agreement)
authenticate communications (public key signatures, MACs)
Replica Stores:
replicaID
viewNumber
(Primary = viewNumber % N
)
[]log
of <opcode, sequenceNumber, {PRE-PREPARED, PREPARED, COMMITTED}>
Algorithm: using a Replicated State Machine (RSM) with 3f+1 replicas
opcode
to Primary
Primary
broadcast Pre-prepare<viewNumber, sequenceNumber, opcode>
(and put it in log)Replicas
receive broadcast, determine if Pre-prepare<>
is valid, if so, broadcast Prepare<replicaID, viewNumber, sequenceNumber, opcode>
(and put it in log), wait until 2f+1 Prepare<>
from other replicas doing the same thingviewNumber
in message compared to storedPre-prepare<>
with the same viewNumber
sequenceNumber
<opcode1, viewNumber, sequenceNumber, replicaID1>
in []log
, then there is no <opcode2, viewNumber, sequenceNumber, replicaID2>
in []log
opcode
Prepare<>
and Pre-prepare<>
Replicas
received 2f+1 Prepare<>
, send Commit<replicaID, viewNumber, sequenceNumber, opcode>
, wait until 2f+1 Commit<>
from other replicas doing the same thingReplicas
received 2f+1 Commit<>
(if we assume viewNumber
can change, then we only need f+1), (put it in log), send result to clientClient
waits for f+1 matching replies before commitTable of Content