Lecture 013 - MPI, MapReduce, Hadoop

Cluster Computing: MPI & Map Reduce

High Performance Computing

HPC Architecture

HPC Architecture

Typicall HPC Machine: higher end than usual clusters

Example: TaihuLight with 10,649,600 Cores, 1,310,720 GB memory, 93,014.6 TFlop/s. OakRidge Frontier with 8,730,112 specialized for Tensor computation (1,102,00 TFlop/s)

HPC Programming Model

Typical HPC Operation

Message Passing Interface (MPI)

barrier: dependency in computational graph

Message Passing Interface (MPI): Standardized communication protocol for programming parallel computers with functions like

Standardized set of group communication methods

Standardized set of group communication methods

HPC Fault Tolerance by Checkpoint

HPC Fault Tolerance by Checkpoint

But for application-writers we don't need such low level as HPC...

Actor Model

Instead of using processes, we abstract communication as actors.

Actor Model

Actor Model

We can have multiple actors with mailbox in one application. They are not constrained by phiscal locality.

Typical Cluster

Typical Cluster

The network is slow compared to storage space.

Time takes to read/write or to transfer 10 TB data

Time takes to read/write or to transfer 10 TB data
Therefore we want to move our data as little as possible.

In typical cluster: Application programs written in terms of high-level data operations, Runtime system controls scheduling, load balancing...

MapReduce Cluster Model

MapReduce Cluster Model

To compute word frequency in a book:

  1. Map: break the book into smaller sections, distribute to each node
  2. Compute: each node computes word frequency dictionary (k, v)
  3. Sort: each node send to node k data v
  4. Reduce: node ks combine their data with node k'.

Before every step finishes, they typically persist previous job's state on dis to do failure recovery.

Hadoop Project

Hadoop's Map Reduce

Hadoop Project: HDFS Fault Tolerance + MapReduce Programming Environment

In typical cluster, machines are mixed, and codes are high level.

Hadoop MapReduce API

MapReduce: Simplified Data Processing on Large Clusters

MapReduce: Simplified Data Processing on Large Clusters

Run Big Projects:

  1. Choose Google's GFS or Hadoop's HDFS (built-in reliability via replication)
  2. Break work into tasks
  3. Load data in file system
  4. Run program
  5. Retrieve data in file system

MapReduce Provides Coarse-Grained Parallelism

Hadoop's Fault Tolerance

Dynamically scheduled: If a node fail, detect it with manager node (by heartbeat) and migrate job to other nodes.

Hadoop Project is important because big jobs were impossible due to increase probability of one failure when node size increase. Hadoop Project makes jobs done by: - breaking into many short-lived tasks - use disk storage to hold intermediate results to reschedule task in failure

Advantage of clusters:

Disadvantage of clusters:

Stragglers: Tasks that take long time to execute due to bugs, flaky hardware, poor partitioning. In this case, we detect and raise error.

When most of jobs are finished, we reschedule remaining tasks (if not buggy) to other nodes to reduce overall run time.

MPI vs. Map Reduce

Low vs High end

Low vs High end



Example: calculate popularity of a social network account

Table of Content