Chuvash Bilingual Corpus


The corpus is organized to achieve two goals. The first one is to create Chuvash language corpus; the second is to prepare Chuvash-Russian parallel texts.

Chuvash bilingual corpus started its work in 2016. The corpus is considered as a part of “Chuvash language laboratory” and fulfils its goals.

“Chuvash language laboratory” is a project created and developed at the initiative of activists and supported by their funds. Its goal is to introduce the Chuvash language into the near-computer sphere. It does not belong to any institution or organization, and does not receive money from the government.

The corpus structure

The corpus structure: texts — sentences — words.

The texts are divided by their types (publicism, scientific articles, prose, poetry, laws, etc.), as well as by topics (culture, military, agriculture, technology, etc.). The text authors and their sources are also indicated.

The corpus can identify the roots of words — this functionality is implemented using the Hunspell dictionary.

Warning: there may be errors in the corpus texts, such as typos.

Using the corpus

Currently, the use of the corpus is free of charge. Although the search for now is performed in a simple way, you can find a particular word in the texts, its use, as well as the frequency of use of words in the texts.

If there is a need for more complex queries, please contact the corpus developers (

Corpus contributors

• Nicholay (Astahar) Plotnikov is a head, software developer, and maintainer of the corpus website.

• Alexander Antonov is a software developer and machine translation specialist.

Text processing: Erbina Portnova, Marina Yakovleva, Svetlana Trofimova and others.

For more details about corpus contributors please visit the "Users" section.


The corpus receives assistance from:

• “Haval” association;

• Chuvash state Institute of human sciences;

• National library of the Chuvash Republic;

• Institute of education of the Chuvash Republic;

• as well as private persons.




...more detailed