libextract
на сайте с 12 декабря 2022, 09:19
Libextract: extract data from websites.
Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python. Originating from eatiht, the extraction algorithm works by making one simple assumption: data appear as collections of repetitive elements. You can read about the reasoning here.