Nerus

Nerus — большой синтетический русскоязычный датасет с разметкой морфологии, синтаксиса и именованных сущностей, на сайте с December 22, 2022 06:31
Nerus is a large silver standard Russian corpus annotated with POS tags, syntax trees and NER tags (PER, LOC, ORG). Nerus has a certain degree of errors in markup, but the quality is high, see the evaluation section. The corpus contains ~700K news articles from Lenta.ru. Tools from project Natasha were used: Razdel for sentence and token segmentation, Slovnet BERT models for morphology, syntax and NER annotation. Markup is stored in the standard CoNLL-U format.