Lecture 017 - Scaling

Scaling Techniques and Architecture

Scale Up: vertical scaling (add resources to single node)

Scale Out: horizontal scaling (add more nodes)

\text{load} = \frac{E[\text{arrival rate}]}{E[\text{service rate}]}

By queueing theory, if things accumulate in the queue (\text{load} > 0.5), the response time will grow up exponentially (See my 15-259 note on queueing theory)

Timeing of scaling is crucial

A typical situration is that you are using other people's server (Google Cloud), so you can launch or close a server very quickly under 1 min. Therefore, you can open and close servers dynamically based on the queue length.

You can set a (\text{minimum}, \text{maximum}) queue length to trigger scale out. When below \text{minimum}, close servers. When above \text{maximum}, add servers. Hyteresis: to avoid wasteful oscillations, you need to has a significant gap between upper and lower thresholds.

Caution: often queue length also insufficient. Might have to resort to some degree of overprovisioning.

Service Level Agreement (SLA):

Typical Growth

Typical Growth: Starting Point

Typical Growth: Starting Point

Starting point

Typical Growth: 2-tier Website

Typical Growth: 2-tier Website

2-tier Website

Typical Growth: 3-tier Website

Typical Growth: 3-tier Website

3-tier Website

Typical Growth: Large Website

Typical Growth: Large Website

Large website

Typical Growth: Highly-scaled Website

Typical Growth: Highly-scaled Website

Highly-scaled Website

Monolithic Architecture: can't be decoupled

Monolithic Architecture: can't be decoupled

Micro-Service Architecture: better team management

Micro-Service Architecture: better team management

Failure handling

Table of Content