news-please

на сайте с December 18, 2022 01:34
news-please - an integrated web crawler and information extractor for news that just works. news-please is an open source, easy-to-use news crawler that extracts structured information from almost any news website. It can recursively follow internal hyperlinks and read RSS feeds to fetch both most recent and also old, archived articles. You only need to provide the root URL of the news website to crawl it completely.