System Design10 lessons20 quiz questions

Rate Limiting & Throttling

Rate limiting protects your system from being overwhelmed. Mental model: think of it as a valve on a pipe. The algorithms differ in how they 'shape' the flow. Token bucket allows bursts. Leaky bucket smooths output. Sliding window is accurate. For distributed systems, Redis is the common shared...

What You Will Learn

  • Rate Limiting Algorithms: Token Bucket, Leaky Bucket, and Sliding Windows
  • Distributed Rate Limiting with Redis and System Design Patterns
  • Token Bucket Algorithm
  • Sliding Window Log
  • Sliding Window Counter (Approximate)
  • Distributed Rate Limiting
  • API Quota Management
  • Rate Limiting at Different Layers
  • Edge Cases and Production Concerns
  • System Design Mock: Rate Limiter

Overview

Rate limiting protects your system from being overwhelmed. Mental model: think of it as a valve on a pipe. The algorithms differ in how they 'shape' the flow. Token bucket allows bursts. Leaky bucket smooths output. Sliding window is accurate. For distributed systems, Redis is the common shared state store. Why Rate Limiting Exists Without rate limiting, a single misbehaving client can consume all your server capacity — whether malicious (DDoS, credential stuffing) or accidental (runaway retry loop). Rate limiting protects: Availability: prevent resource exhaustion Fairness: ensure no single tenant starves others Cost: cap unbounded API usage that maps to cloud spend Security: slow brute-force and enumeration attacks --Token Bucket Algorithm The most widely deployed algorithm. Imagine a bucket that holds tokens: Properties: Allows bursting up to tokens Smooth average rate of requests/second Memory: O(1) per user — only stores and Used by: AWS API Gateway, Stripe, GitHub API --Leaky Bucket Algorithm Requests enter a queue (the bucket) and are processed at a fixed rate — like water leaking from a hole. Excess requests overflow and are dropped. Properties: Outputs at perfectly constant rate — ideal for downstream systems with fixed capacity Does NOT allow bursting — a sudden spike is immediately throttled Used by: network traffic shaping, leaky bucket packet schedulers Token Bucket vs Leaky Bucket: Token Bucket Yes (up to capacity) Average R/s, variable instantaneous API rate limiting --Fixed Window Counter Divide time into fixed windows (e.g., 1-minute intervals). Count requests per window per user. Problem: Boundary Attack A user makes 100 requests at 00:59 and 100 more at 01:Both windows allow them, but they effectively sent 200 requests in 2 seconds — double the limit. --Sliding Window Log Maintain a sorted set of request timestamps per user. On each request: Remove timestamps older than the window Count remaining entries If count < limit, add current timestamp and allow Properties: Precise, no boundary attack. Cost: O(requests in window) memory per user — expensive at high traffic. --Sliding Window Counter (Hybrid) Approximate sliding window using two fixed-window counters: Example: window=60s, limit=At second 45 of the current window: Previous window had 80 requests Current window has 30 requests Contribution from prev: 80 × (15/60) = 20 Estimated total: 20 30 = 50 → allowed Best of both worlds: O(1) memory, no boundary attack, slight approximation (~0.003% error at typical traffic). --Algorithm Selection Guide Choosing the wrong algorithm creates either poor user experience (dropping legitimate bursts) or poor protection (allowing exploitation). Here is the decision framework: Rule of thumb for APIs: Token bucket for public endpoints, sliding window counter for internal APIs where exactness matters. --Rate Limiting in Practice: Stripe's Approach Stripe published a detailed blog post about their rate limiting system.

Continue learning Rate Limiting & Throttling with full lessons, quizzes, and interactive exercises.

Continue Learning on Guru Sishya →

Sample Quiz Questions

1. What HTTP status code should a rate limiter return when a limit is exceeded?

Remember·Difficulty: 1/5

2. Which rate limiting algorithm allows a burst of requests followed by a sustained rate?

Understand·Difficulty: 2/5

3. The leaky bucket algorithm smooths output regardless of input burst rate.

Remember·Difficulty: 1/5

+ 17 more questions available in the full app.

Related Topics

Master Rate Limiting & Throttling for Your Next Interview

Get access to full lessons, adaptive quizzes, cheat sheets, code playground, and progress tracking — completely free.