System Design10 lessons20 quiz questions
Scalability Patterns
Scalability is about delaying the need to rewrite your system. The mental model: scale up first (simple), then scale out (requires stateless design), then add caching (reduce load), then shard (last resort). Each step adds complexity. Add only when needed.
What You Will Learn
- ✓Scale Up vs Scale Out
- ✓Stateless Services
- ✓Auto-Scaling
- ✓Queue-Based Load Leveling
- ✓Bulkhead Pattern
- ✓Back-Pressure
- ✓Connection Pooling at Scale
- ✓Designing Scalable Systems - The Scale Cube
- ✓Performance vs Scalability
- ✓System Design Mock: Scalability
Overview
Scalability is about delaying the need to rewrite your system. The mental model: scale up first (simple), then scale out (requires stateless design), then add caching (reduce load), then shard (last resort). Each step adds complexity. Add only when needed.
Horizontal vs Vertical Scaling — The Core Trade-off
Vertical Scaling (Scale Up)
Add more resources to a single machine: more CPU cores, RAM, faster storage.
Advantages:
Zero application changes required
No distributed systems complexity
Shared memory — no serialization overhead between components
Strong consistency trivially (single node)
Disadvantages:
Hard upper limit: largest AWS instance is — 448 vCPUs, 24 TB RAM, ~$220/hour
Single point of failure — one machine crash = total outage
Cost increases super-linearly: 2x resources ≈ 4-8x cost at the high end
Maintenance window requires full downtime
Horizontal Scaling (Scale Out)
Add more machines of the same size, distribute load across them.
Advantages:
Theoretically unlimited scale
No single point of failure
Cost scales linearly with capacity
Disadvantages:
Application must be stateless
Added latency for inter-node communication (1–10 ms vs nanoseconds in-memory)
Operational complexity: service discovery, load balancing, distributed tracing
Stateless Services — The Key to Horizontal Scaling
A stateless service stores no session state in memory. Every request carries all the information needed to process it, or the service fetches state from a shared external store.
What Must Move Out of the Application Server
Move To
Redis / DynamoDB File uploads
SQS worker fleet WebSocket connections
Shared Redis or CDN |
Auto-Scaling
Key auto-scaling numbers:
EC2 instance launch: 90–180 seconds (AMI user-data boot)
Kubernetes pod scale-up: 5–30 seconds (image cached) or 2–5 minutes (image pull)
Lambda cold start: 100–500 ms (Python/Node) or 1–10 s (JVM)
This means auto-scaling cannot react to sudden traffic spikes — you need pre-warming or buffer capacity (20–30% headroom).
Interview Q&A
Q: When would you use vertical scaling over horizontal?
A: Early stage (< 10k RPS), stateful applications that are expensive to refactor, databases that do not shard well, or when operational simplicity outweighs cost. Always start vertical, scale horizontal when you hit the ceiling.
Q: How do you handle WebSocket connections during horizontal scaling?
A: Use a sticky load balancer (IP hash or cookie-based) for the connection, plus Redis Pub/Sub for cross-node message delivery. Or use a dedicated WebSocket gateway (AWS API Gateway WebSocket, Socket.io with Redis adapter).
Q: What is the thundering herd problem in auto-scaling?
A: When a spike hits, all waiting requests pile up. New instances come online and immediately receive a tsunami of queued requests. Mitigate with request queuing, circuit breakers, and gradual traffic shifting.
Java Implementation
Sample Quiz Questions
1. What is the primary advantage of horizontal scaling over vertical scaling?
Remember·Difficulty: 1/5
2. Which of the following makes a service stateless?
Understand·Difficulty: 2/5
3. Back-pressure allows a consumer to signal a producer to slow its output rate.
Remember·Difficulty: 1/5
+ 17 more questions available in the full app.
Related Topics
Master Scalability Patterns for Your Next Interview
Get access to full lessons, adaptive quizzes, cheat sheets, code playground, and progress tracking — completely free.