# Lecture 012 - RAID

## Measuring Failure

Hard errors: damaged component which experience fail-stop (bad solder, defective DRAM bank)

Soft errors: a flipped signal or bit, caused by external source or a faulty component (cosmic radiation, alpha particles)

Mean Time to Failure (MTTF)

Mean Time to Repair (MTTR)

Mean Time Between Failure (MTBF) = MTF + MTTR

Availability = MTTF / (MTTF + MTTR)

• airlines: 99.9993%

• 911 phone service: 99.994%

• standard phone: 99.99%

• internet: 95%~99.6%

Avaliability Service Level Objective (SLO): An availability threshold which your system targets

Avaliability Service Level Agreement (SLA): An availability threshold that you guarantee for customers

Failure correlation: they are not correlated if they are physically separated (based on statistics)

Design Fault Tolerant Consideration

• The probability of failure of each component

• The cost of failure

• The cost of implementing fault tolerance

Error Detection: timeout, parity, checksum

Error Correction: retry

## Error Detection

Goals:

• detect failure

• correct failure

Error Detecting Code: general scheme

1. imagine a network transmission situration
2. sender send data $D$, and hash $f(D)$
3. receiver check $D = f(D)$ upon receive

Single Bit Parity: cannot reliably detect multiple bit burst errors

1. given 7 data bits
2. we append 1 bit at the end calculated as the sum of 7 bits (mod 2)

Checksum: little better than Single Bit Parity since there are more bits

• Simple to implement

• Relatively weak detection

• Still tricked by typical error patterns - e.g. burst errors

Cyclic Redundancy Check (CRC):

• treat data $D$ as polynomial coefficients

• choose $r+1$ bits as coefficients of generator polynomial $G$ (send in advance)
• add $r$ bits to packet as CRC bits $R$
• so packet $(D, R)$ should be divisible by generator $G$
• can detect all error less than $r+1$ bits
• Can detect all burst errors less than r+1 bits

• Efficient streaming implementation in hardware
• x86 instruction to calculate CRC
• used in ethernet and hard drives

## Error Correction

Two scheme of error recovery

• redundancy (forward recovery)

• retry (backward recovery)

### Error Correcting Codes (ECC)

Courses

• 15-853 Algorithms in the real world

• 15-848 Practical information and coding theory for computer systems

### Replication and Voting

Triple modular redundancy:

• Send the same request to 3 different instances of the system

• Compare the answers, take the majority

Widely used in space application, since they have money to spend and high rate of failure due to cosmic rays.

### Retry

We separate "detection" and "correction" by first detect the error and retry tranmission.

## Redundant Array of Inexpensive Disks (RAID)

Hard drive: sequences of small data sectors with 4KB, operated by spinning disks

RAID: Use multiple disks to form single logical disk

Definitions

• Reliability: # of disk failures we can tolerate

• Latency: time to process Read/Write requests

• Throughput: bandwidth for R/W requests

We assume random read write. We assume same throughput and latency for read write for all disks.

RAID Levels:

• RAID 0: Data striping without redundancy

• Interleave data across multiple disks for a file (no tolerance)
• Parallel read and write across multiple disks
• Poor reliability
• RAID 1: Mirroring of independent disks

• make two or more copies of the same data (tolerate 1 disk failure)
• need to write in both, can read in either
• Poor capacity
• RAID 2: ...

• RAID 3: ...

• RAID 4: Data striping plus parity disk

• ensure $D_1 \odot D_2 \odot D_3 \neq D_p, D_2 \odot D_3 \odot D_p = D_1$ assuming $D_1$ fails and $D_p$ is parity disk.
• when write, we need to update parity disk when we write (Parity disk can easily be a bottleneck)
• Adding disk does not provide any performance gain
• RAID 5: Data striping plus stripped (rotating) parity

• distribute parity disk to other disks
• Good compromise choice
• RAID 6: ...

Mean Time To First Data Loss (MTTDL): calculate from MTTF

• Sequential $n$ device: $MTTDL_n = MTTDL_1 / n$

• Parallel $n$ device: $MTTDL_n = \sum_{i = 1}^n \frac{MTTF_1}{i}$

• $k$ Parity $n$ data: $MTTDL_n = \sum_{i = n+k}^{n} \frac{MTTF_1}{i}$

From continuous time Markov Chain, we calculate MTTDL is around $MTTF_{disk} / n$ with $n$ disks (more $n$, more likely to fail). Derivation of MTTDL using Markov Chain can be found Here

restoring redundancy after failure: reconstruct when the first drive fails

• Modes

• Normal mode
• Degraded mode: some disk unavailable
• Rebuild mode: reconstructing lost disk’s contents onto spare