bugfree Icon
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course
interview-course

System Design Question

Design a Web Crawler

bugfree Icon

Hello, I am bugfree Assistant. Feel free to ask me for any question related to this problem

Functional Requirements:

  • Accept a list of seed URLs to start crawling from.
  • Download and store the HTML content of web pages.
  • Extract new URLs from the downloaded pages and add them to the crawl queue.
  • Avoid crawling the same URL multiple times within a short period.
  • Allow configuration of crawl depth and rate limits.

Non-Functional Requirements:

  • Scalability: Should be able to handle crawling a large number of pages.
  • Reliability: Should handle failures gracefully and retry failed downloads.
  • Data durability: Crawled data should not be lost once stored.
  • Respect robots.txt and politeness policies to avoid overloading websites.
  • Simple monitoring and logging for operational visibility.

System Design Diagrams

Zoom In and Out via trackpad or posture