System Design Cases10 lessons22 quiz questions
Design: Web Crawler
10-session plan from basic BFS crawling to distributed petabyte-scale web crawling.
What You Will Learn
- ✓Core Concepts & Requirements
- ✓URL Frontier Design
- ✓URL Deduplication
- ✓Fetcher & robots.txt
- ✓HTML Parsing & Link Extraction
- ✓Content Deduplication
- ✓Data Model & Storage
- ✓Fault Tolerance & Recrawl
- ✓Distributed Architecture
- ✓Mock Interview
Overview
10-session plan from basic BFS crawling to distributed petabyte-scale web crawling.
Design: Web Crawler: Requirements & Scope
Functional Requirements
Before designing anything, clarify what the system needs to do:
Core features — What are the must-have features?
User types — Who uses the system? End users, admins, API consumers?
Scale — How many users? What's the read/write ratio?
Non-Functional Requirements
Availability: 99.9% (8.7 hours downtime/year) or 99.99%?
Latency: < 100ms for reads? < 500ms for writes?
Consistency: Strong or eventual? CAP trade-off?
Durability: Can we lose data? How much?
Clarifying Questions to Ask
What's the expected DAU (Daily Active Users)?
What's the read-to-write ratio?
Do we need real-time updates?
What's our budget constraint?
What geography do we serve?
Interview Tip
"I always spend the first 5 minutes clarifying requirements. This shows structured thinking and prevents designing the wrong system. I explicitly separate functional from non-functional requirements."
Java Implementation
Python Implementation
Sample Quiz Questions
1. Why is BFS (Breadth-First Search) preferred over DFS for web crawling?
·Difficulty: easy/5
2. What is the purpose of a Bloom filter in a web crawler?
·Difficulty: medium/5
3. A website generates infinite unique URLs like /products?page=1, /products?page=2 ... /products?page=1000000. How do you handle this?
·Difficulty: hard/5
+ 19 more questions available in the full app.
Related Topics
Master Design: Web Crawler for Your Next Interview
Get access to full lessons, adaptive quizzes, cheat sheets, code playground, and progress tracking — completely free.