Lecture 009 - 2PC

Concurrency Control

We assume no failures.

Composit Operations

There is issue with composite operation even we made individual operation thread-safe:

Insufficient Atomicity: threads will run in concurrent where operations are interleaving
Fault Tolerance: one thread can crash in the middle of two thread-safe operations

Want programmer to be able to specify that a set of operations should happen atomically.

A "transaction" either 1. commits: executes correctly 2. aborts: has no effect at all

ACID Properties:

Atomicity: transaction either complete or abort. It should abort with no side-effect
Consistency: Each transaction preserves a set of invariants about global
Isolation: each transection executes as if it were the only one with the ability to read/write shared global state
Durability: committed transaction's effect will persist

Atomic Operation: Atomicity + Isolation

Transection could be nested.

Single Server Consistency

We need to acquire locks according to some consistent global order: If we have $\{L_1, L_2, L_3, ..., L_n\}$ in the system, and we establish a total order on all $L_i$ , then there is no deadlock.

Two Phrase Locking

Two types of locks:

s-lock: shared locks for read
x-lock: exclusive locks for writes

	Shared	Exclusive
Shared	Compatible	Not
Exclusive	Not	Not

2-phase Locking: it has to look like a "mountain" but not a "mountain range". Once you decide to go down, don't go up.

2-phase Locking (2PL): growing, shrinking

acquire or escalate locks (from s-lock to x-lock) as needed (atomically or according to some total order)
once you decide to release lock, you can no longer acquire locks anymore (because we can't acquire locks after the first release, we need to figure out all possible locks we need before release)

2-phase locking is good as long as we assume every transaction to commit instead of abort. A bad situration would be "cascading aborts" (later decision to aborts need to cause rollback on previous edits): 1. acquire lock for A 2. write data to A 3. release lock for A 4. decide to abort for some reason

In this case, since it released lock A where A contains partial data, other thread might read into this unstable state.

Strong Strict 2-phase Locking (SS2PL): we only release all locks at once after all operations finished

this solves cascading aborts problem because we always abort before we release locks
we still need to figure out all locks we need in one transaction before any locking and reading any data (therefore all possible locks) so that we can acquire locks in total order or atomically.

But there is one way to not lock all possible locks before hand: we just use a library who manages lock and tells us whether we have a deadlock. (Since above method will still be correct if we don't lock atomically or in total order, but might generate deadlock)

Build Graph: lock manager builds a "wait-for" graph. On finding a cycle, force offending transaction to abort and try again.
Timeout: if transation take too long time, then force abort.

Distributed Transactions

Distributed Database: Partition databases across multiple machines for scalability

We require either all machines commit transection, or none.

One-phase Commit:

Participant: manage its own database and do whatever the coordinator says
Coordinator: force all participants to commit

One-phase commit only works when the transactions in all database result to commit with no violation. This is because if one server decide to abort due to violation of some rules, it cannot tell other servers who find no violation on their own database to abort.

2-phase Commit:

Coordinator send transaction request to all participants
Participants figure out all state changes, if a transaction is not possible due to violation, vote ABORT, else vote OK to the coordinator
Coordinator broadcast voting result to all participants: either COMMIT or ABORT (COMMIT if and only if all participants vote OK)
Participants COMMIT or ABORT accordingly.

Properties of 2-phase commit:

Correctness: yes
Performance: $3n$ message per transaction with $n$ participants (ACK not counted)
Failure: use timeout, recover using logging to stable storage
- participants fail after CanCommit: nothing changed
- participants fail after VoteAbort: participants, on recovery, can abort based on local information (it knows it must abort)
- participants fail after VoteCommit: unable to recover based on local information. It need to ask coordinator for decision (if coordinator fail or no one got decision: transaction is blocked)

blocked: transaction is aborted even though it should commit based on the data, due to failure.

2-phase commit (2PC) is a blocking protocol because it block if no one got decision.

For performance, in reality, since a transaction is very likely to succeed, every server will commit optimistically and rollback if necessary.

2PC is used in MySQL, PostgreSQL, CloudSpanner, Kafka, NDB Cluster...
we use logging for crash recovery (very powerful and resilient when paired with RAID)

Note that with 2-phrase commit, it is still possible to have deadlock in a local machine (cyclic dependency of locks). In this case, a participant is unable to respond to voting request. We can handle it with timeout. If participants time out, then coordinator assumes ABORT vote from that participant and retry transaction again (be careful with livelock when retry).

There are other methods like 3PC, which has correctness issues in asynchronous networks.

Table of Content