System Design10 lessons20 quiz questions

Scalability Patterns

Scalability is about delaying the need to rewrite your system. The mental model: scale up first (simple), then scale out (requires stateless design), then add caching (reduce load), then shard (last resort). Each step adds complexity. Add only when needed.

What You Will Learn

  • Scale Up vs Scale Out
  • Stateless Services
  • Auto-Scaling
  • Queue-Based Load Leveling
  • Bulkhead Pattern
  • Back-Pressure
  • Connection Pooling at Scale
  • Designing Scalable Systems - The Scale Cube
  • Performance vs Scalability
  • System Design Mock: Scalability

Overview

Scalability is about delaying the need to rewrite your system. The mental model: scale up first (simple), then scale out (requires stateless design), then add caching (reduce load), then shard (last resort). Each step adds complexity. Add only when needed. Horizontal vs Vertical Scaling — The Core Trade-off Vertical Scaling (Scale Up) Add more resources to a single machine: more CPU cores, RAM, faster storage. Advantages: Zero application changes required No distributed systems complexity Shared memory — no serialization overhead between components Strong consistency trivially (single node) Disadvantages: Hard upper limit: largest AWS instance is — 448 vCPUs, 24 TB RAM, ~$220/hour Single point of failure — one machine crash = total outage Cost increases super-linearly: 2x resources ≈ 4-8x cost at the high end Maintenance window requires full downtime Horizontal Scaling (Scale Out) Add more machines of the same size, distribute load across them. Advantages: Theoretically unlimited scale No single point of failure Cost scales linearly with capacity Disadvantages: Application must be stateless Added latency for inter-node communication (1–10 ms vs nanoseconds in-memory) Operational complexity: service discovery, load balancing, distributed tracing Stateless Services — The Key to Horizontal Scaling A stateless service stores no session state in memory. Every request carries all the information needed to process it, or the service fetches state from a shared external store. What Must Move Out of the Application Server Move To Redis / DynamoDB File uploads SQS worker fleet WebSocket connections Shared Redis or CDN | Auto-Scaling Key auto-scaling numbers: EC2 instance launch: 90–180 seconds (AMI user-data boot) Kubernetes pod scale-up: 5–30 seconds (image cached) or 2–5 minutes (image pull) Lambda cold start: 100–500 ms (Python/Node) or 1–10 s (JVM) This means auto-scaling cannot react to sudden traffic spikes — you need pre-warming or buffer capacity (20–30% headroom). Interview Q&A Q: When would you use vertical scaling over horizontal? A: Early stage (< 10k RPS), stateful applications that are expensive to refactor, databases that do not shard well, or when operational simplicity outweighs cost. Always start vertical, scale horizontal when you hit the ceiling. Q: How do you handle WebSocket connections during horizontal scaling? A: Use a sticky load balancer (IP hash or cookie-based) for the connection, plus Redis Pub/Sub for cross-node message delivery. Or use a dedicated WebSocket gateway (AWS API Gateway WebSocket, Socket.io with Redis adapter). Q: What is the thundering herd problem in auto-scaling? A: When a spike hits, all waiting requests pile up. New instances come online and immediately receive a tsunami of queued requests. Mitigate with request queuing, circuit breakers, and gradual traffic shifting. Java Implementation

Sample Quiz Questions

1. What is the primary advantage of horizontal scaling over vertical scaling?

Remember·Difficulty: 1/5

2. Which of the following makes a service stateless?

Understand·Difficulty: 2/5

3. Back-pressure allows a consumer to signal a producer to slow its output rate.

Remember·Difficulty: 1/5

+ 17 more questions available in the full app.

Related Topics

Master Scalability Patterns for Your Next Interview

Get access to full lessons, adaptive quizzes, cheat sheets, code playground, and progress tracking — completely free.