System Design Question

Design a Web Crawler

Hello, I am bugfree Assistant. Feel free to ask me for any question related to this problem

Functional Requirements:

Accept a list of seed URLs to start crawling from.
Download and store the HTML content of web pages.
Extract new URLs from the downloaded pages and add them to the crawl queue.
Avoid crawling the same URL multiple times within a short period.
Allow configuration of crawl depth and rate limits.

Non-Functional Requirements:

Scalability: Should be able to handle crawling a large number of pages.
Reliability: Should handle failures gracefully and retry failed downloads.
Data durability: Crawled data should not be lost once stored.
Respect robots.txt and politeness policies to avoid overloading websites.
Simple monitoring and logging for operational visibility.

System Design Diagrams

Zoom In and Out via trackpad or posture