на сайте с May 04, 2023 18:25
This is an updated version of cointegrated/rubert-tiny: a small Russian BERT-based encoder with high-quality sentence embeddings. This post in Russian gives more details. The differences from the previous version include: a larger vocabulary: 83828 tokens instead of 29564; larger supported sequences: 2048 instead of 512; sentence embeddings approximate LaBSE closer than before; meaningful segment embeddings (tuned on the NLI task) the model is focused only on Russian.