Goose

на сайте с 12 декабря 2022, 09:18
Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a scala project. This is a complete rewrite in Python. The aim of the software is to take any news article or article-type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate.