HardDesign: Web Crawler
How should a crawler handle a 404 Not Found response?
— Tests your understanding of this concept.
Answer Options
AImmediately retry the URL
BRemove from queue, never retry
CMark as failed, retry with exponential backoff, eventually abandon
DCrawl parent directory instead
Want to see the correct answer?
Get the answer with a detailed explanation, plus practice 22+ more Design: Web Crawler questions with adaptive quizzes and timed interviews.
See the Answer on Guru Sishya →This question is from the Design: Web Crawler topic (System Design Cases).
More Design: Web Crawler Questions
Why is BFS (Breadth-First Search) preferred over DFS for web crawling?
HardWhat is the purpose of a Bloom filter in a web crawler?
HardA website generates infinite unique URLs like /products?page=1, /products?page=2 ... /products?page=1000000. How do you handle this?
HardWhat does robots.txt's 'Crawl-delay: 10' directive mean?
HardHow do you crawl JavaScript-rendered Single Page Applications (SPAs)?
Hard