Lecture 017 - Scaling

Scaling Techniques and Architecture

Scale Up: vertical scaling (add resources to single node)

no application changes
no new failure mode, latency concerns
more expensive
hits limit very soon

Scale Out: horizontal scaling (add more nodes)

application design will be affected
complex failure modes, latency concern
scales better

$\text{load} = \frac{E[\text{arrival rate}]}{E[\text{service rate}]}$

By queueing theory, if things accumulate in the queue ( $\text{load} > 0.5$ ), the response time will grow up exponentially (See my 15-259 note on queueing theory)

Timeing of scaling is crucial

too late -> long period of suboptimal response time
too soon: greater overhead and underutilized resources

A typical situration is that you are using other people's server (Google Cloud), so you can launch or close a server very quickly under 1 min. Therefore, you can open and close servers dynamically based on the queue length.

You can set a $(\text{minimum}, \text{maximum})$ queue length to trigger scale out. When below $\text{minimum}$ , close servers. When above $\text{maximum}$ , add servers. Hyteresis: to avoid wasteful oscillations, you need to has a significant gap between upper and lower thresholds.

Caution: often queue length also insufficient. Might have to resort to some degree of overprovisioning.

Service Level Agreement (SLA):

contractual agreement by cloud owner to service owner
typical: $90\%$ responses $<200ms$ , $90\%$ responses $<300ms$ , $99\%$ responses $<400ms$ .

Typical Growth

Starting point

characteristics
- static pages
- Common Gateway Interface (CGI) scripts for custom content
- .htaccess in Apache HTTP Server: redirections, custom error, force HTTPS, password protection, prevent hotlinking...
- run on small server: AWS EC2 "micro" instance, Google Cloud "f1-micro" instance
performance
- content: easy to add user content, limited storage
- decouple: administrative - hard to maintain user accounts
- load: limited to one machine

2-tier Website

characteristics
- frontend web server: more cores, RAM
- backend database server: more disks
- application server might be the next bottleneck
performance
- content: more, unlimited storage
- decouple: better, since DB is separate from logic

3-tier Website

characteristics
- scale out storage
- middle tier application servers
- open to probing (crawlers) from Internet
- A TLS termination proxy (or SSL termination proxy, or SSL offloading): a proxy server that acts as an intermediary point between client and server applications, and is used to terminate and/or establish TLS (or DTLS) tunnels by decrypting and/or encrypting communications.
- frontend server might be next bottleneck

Large website

characteristics
- load-balancing switch: map single IP to multiple servers
- database might be next bottleneck
Databases: notoriously difficult to scale out since critical transactional operations work best on single machine
- Option 1: only use database for critical things
- Option 2: Use DB as master store, but have some form of cache in front of it
Traditional Database alternatives:
- Key-value stores: provide simple interface for storing key-value pairs
- Memcache: RAM-only storage layer, used as a cache for DB or disk-based KV store
- In-memory DBs: sacrifice durability for performance

Highly-scaled Website

characteristics
- Load balancer limited by ingress link might be next bottleneck
- Database cached in memory
- Georeplication:
  - Deploy to multiple sites around the globe
  - Each site is a large-scale web service
  - DNS resolver to resolves to different IP addresses based on location
  - (or Can be randomized to help with load balancing)
  - New challenge: data consistency

Monolithic Architecture: can't be decoupled

Micro-Service Architecture: better team management

Failure handling

web server: stateless, therefore restart if failed
filesystem: redundancy
database: Replicated DB with "hot spare" (a component, usually disk, is only put into use when there is a failure of other components), logging

Table of Content