Lecture 021 - Byzantine

The Problem: Several divisions of the Byzantine army are camped outside an enemy city, each division commanded by its own general. After observing the enemy, they must decide upon a common plan of action. Some of the generals may be traitors, trying to prevent the loyal generals from reaching agreement.

commander: send command

lieutenants: listen and act to command truthfully

Goal: All loyal generals decide upon the same plan of action. A small number of traitors cannot cause the loyal generals to adopt a bad plan. (Each nonfaulty process learn the true values sent by each of the nonfaulty processes)

Byzantine Failure: adversarial

fail-stop: no adversarial

Failures

Failures

Example Paxos under Byzantine Faults: with 3 servers, the malicious server always respond with ACCEPT, causing other servers to commit on different result.

Example Quorums under Byzantine Faults: (with fail-stop, we only need 2f+1 nodes, ie can tolerate about half, since Paxos relies on overlapping node) - if intersection happens to be occupied by Byzantine node, two Quorums cannot communicate, so we need at least f+1 nodes - for liveness, Quorum size is at most N - f, otherwise non-Quorum node can always lie to nodes in Quorum, causing them to not agree. - so for Quorum size of N-f, overlap region can be calculated with (N-f) + (N-f) - N. We want this to be \geq f + 1 as stated above, giving N \geq 3f + 1

Impossibility: No solution with fewer than 3f + 1 generals can cope with f traitors.

Can Agreement achieved in Faulty Systems? Red block indicate typical systems. Agreement can be reached if there is an klzzwxh:0003

Can Agreement achieved in Faulty Systems? Red block indicate typical systems. Agreement can be reached if there is an X

Byzantine Fault Tolerance (Lamport)

Byzantine Agreement Assumption:

Problem: Byzantine voting problem.

Intuition: if process A lies about his vote, then we can define process A's vote as what other processes B, C, D hears from A.

Algorithm:

  1. every process broadcast their vote
  2. every process broadcast what they hear from every other processes
  3. first majority voting to decide what each server actually voted for
  4. then majority voting to decide the voting result

Assume P3 is malicious, P1 can create the following table by collecting information from P2, P3, P4. x in the entry is malicious data (where original data is 1, and - represent N/A since P1 does not hear from itself)

Hear From \ Vote P1 P2 P3 P4
P1 - - - -
P2 1 1 x 1
P3 x x x x
P4 1 1 x 1

Summing along the columns, we know it is likely P1, P2, P4 all voted for 1 and P3 voted for x. Then, from this result, we know 1 is the majority.

Async. Practical Byzantine Fault (Liskov)

Problem: correctly replicate a opcode in a system of Replicated State Machine (RSM). An example will be Ethereum Classic.

[CS198.2x Week 1] Practical Byzantine Fault Tolerance

Practical Byzantine Fault Assumption:

PBFT Normal Condition

PBFT Normal Condition

Replica Stores:

Algorithm: using a Replicated State Machine (RSM) with 3f+1 replicas

  1. client send opcode to Primary
  2. Primary broadcast Pre-prepare<viewNumber, sequenceNumber, opcode> (and put it in log)
  3. Replicas receive broadcast, determine if Pre-prepare<> is valid, if so, broadcast Prepare<replicaID, viewNumber, sequenceNumber, opcode> (and put it in log), wait until 2f+1 Prepare<> from other replicas doing the same thing
    1. crypto signature is valid
    2. having the same viewNumber in message compared to stored
    3. has not accepted other Pre-prepare<> with the same viewNumber
    4. has not accepted the same sequenceNumber
    5. Above ensure if <opcode1, viewNumber, sequenceNumber, replicaID1> in []log, then there is no <opcode2, viewNumber, sequenceNumber, replicaID2> in []log
    6. Above ensure: All honest nodes that are prepared have the same opcode
    7. Above ensure: At least f+1 honest nodes have sent Prepare<> and Pre-prepare<>
  4. Replicas received 2f+1 Prepare<>, send Commit<replicaID, viewNumber, sequenceNumber, opcode>, wait until 2f+1 Commit<> from other replicas doing the same thing
  5. Replicas received 2f+1 Commit<> (if we assume viewNumber can change, then we only need f+1), (put it in log), send result to client
  6. Client waits for f+1 matching replies before commit

Table of Content