HardDesign: Web Crawler
A fetcher node crashes after popping a URL from the queue but before completing the crawl. How do you prevent URL loss?
— Tests your understanding of this concept.
Answer Options
AThe URL is lost permanently
BUse in-flight tracking: mark URL as 'crawling' with TTL; if not completed in 5 minutes, return to queue
CUse a transaction log on the fetcher
DDuplicate all URLs in two queues
Want to see the correct answer?
Get the answer with a detailed explanation, plus practice 22+ more Design: Web Crawler questions with adaptive quizzes and timed interviews.
See the Answer on Guru Sishya →This question is from the Design: Web Crawler topic (System Design Cases).
More Design: Web Crawler Questions
Why is BFS (Breadth-First Search) preferred over DFS for web crawling?
HardWhat is the purpose of a Bloom filter in a web crawler?
HardA website generates infinite unique URLs like /products?page=1, /products?page=2 ... /products?page=1000000. How do you handle this?
HardWhat does robots.txt's 'Crawl-delay: 10' directive mean?
HardHow do you crawl JavaScript-rendered Single Page Applications (SPAs)?
Hard