Wikitext-2 dataset

WikiText-2, на сайте с 12 декабря 2022, 11:46

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger.

Скачать

^* Extension для Google Chrome

Разрабатывая это приложение я хотел бы чтобы любой мог найти похожие инструменты, технологии, техники и приёмы так же легко, как если бы вы искали в Google "Ruby vs ..." или "Awesome Ruby"

— Корнев Руслан (@woto)

Или воспользуйтесь нашим Телеграм ботом для добавления упоминаний.

Подробнее