Scale Up: vertical scaling (add resources to single node)
no application changes
no new failure mode, latency concerns
more expensive
hits limit very soon
Scale Out: horizontal scaling (add more nodes)
application design will be affected
complex failure modes, latency concern
scales better
By queueing theory, if things accumulate in the queue (\text{load} > 0.5), the response time will grow up exponentially (See my 15-259 note on queueing theory)
Timeing of scaling is crucial
too late -> long period of suboptimal response time
too soon: greater overhead and underutilized resources
A typical situration is that you are using other people's server (Google Cloud), so you can launch or close a server very quickly under 1 min. Therefore, you can open and close servers dynamically based on the queue length.
You can set a (\text{minimum}, \text{maximum}) queue length to trigger scale out. When below \text{minimum}, close servers. When above \text{maximum}, add servers. Hyteresis: to avoid wasteful oscillations, you need to has a significant gap between upper and lower thresholds.
Caution: often queue length also insufficient. Might have to resort to some degree of overprovisioning.
Service Level Agreement (SLA):
contractual agreement by cloud owner to service owner
typical: 90\% responses <200ms, 90\% responses <300ms, 99\% responses <400ms.
Starting point
characteristics
.htaccess
in Apache HTTP Server: redirections, custom error, force HTTPS, password protection, prevent hotlinking...performance
2-tier Website
characteristics
performance
3-tier Website
Large website
characteristics
Databases: notoriously difficult to scale out since critical transactional operations work best on single machine
Traditional Database alternatives:
Highly-scaled Website
Failure handling
web server: stateless, therefore restart if failed
filesystem: redundancy
database: Replicated DB with "hot spare" (a component, usually disk, is only put into use when there is a failure of other components), logging
Table of Content