Lecture 007

Distributed File System

Example: IPFS

Example: IPFS

Blue: machines, Yellow: process, Red: data

Blue: machines, Yellow: process, Red: data

Why Distributed File System

Mission of andrew.cmu.edu:

// QUESTION: why IPFS is successful while AFS is not

Usage:

Challenges:

Building Distributed File System

Assumptions Driven Design: user-centric

Virtual File System

Virtual File System

We need to modify:

Remote Procedure Calls

Assuming one centralized server.

Instead of kernel calls, we send RPC to server for every file operation. Files are completely stored on centralized server, and softwares that invoke file instructions are stored on client.

Tradeoff:

Caching

Client-side Caching: store copy on client

Update Visibility Issue: when write instruction can't be updated immediately to server (or lost when client crash) Cache Staleness Issue: when read instruction is after a remote write

Ideal Model: one copy semantics: no difference if in "no cache" case in user perspective

Broadcast Invalidation

Broadcast Invalidation: every update is broadcasted to every client that uses specific file.

Check on Use: client check with server before use

Callbacks: client register with server when they got a copy, server suppose to multicast to all client who registered when there is an update to file.

Leases: client is granted with exclusive access to server's file with cache

NFSv2

NFSv2

NFSv2 do all file instruction locally, but data consistency is poor. (Unacceptable in distributed applications)

Failure Handling:

AFS: - also caches on disk - prefetching (good for sequential access) - callback

File Access Consistency

UNIX: sequential consistency (atomic operation) NFS: 30 seconds window AFS: session semantics

AFSv2:

The last person who close will always win: Remzi, OS Book, "AFS"

The last person who close will always win: Remzi, OS Book, "AFS"

AFS benifits vs NFS:

Naming AFS vs NFS: NFS file name is not consistent accross clients, but AFS is

Naming AFS vs NFS: NFS file name is not consistent accross clients, but AFS is

Authentication

Say Hanke logs onto [email protected] machine and want to access [email protected], the question is:

So we need authentication at application level.

Symmetic Crypto (Private Key)

Asymmetric Crypto (Public Key)

Key Distribution Center (KDC)

Key Distribution Center (KDC): whoever stores your password and log you in. And plus it can encode message to let you talk to others (who also registered to KDC) secretly.

Key Distribution Center (KDC)

Key Distribution Center (KDC)

Key distribution center uses Symmetic Crypto, but it provide single point of failure: KDC knows the key for everybody, which is bad.

CODA's Replica Control

CODA: successor of AFS that allow clients to operate in disconnected mode

Pessimistic Replica Control: require C to obtain a lock before being offline

Optimistic Replica Control: .git

Examples of Version Control System:

Table of Content