CORPUS OF TATAR: CONCEPTION AND LINGUISTIC ASPECTS

Abstract: 

The conception of the Tatar language corpus is discussed in the paper. The model of the corpus is proposed and the way of representing linguistic information and principles of morphological annotation of Tatar texts are reviewed. As a specific aspect, the problem of representativeness is investigated and specific statistic approach is proposed. Corpus building issues are analyzed on the basis of the language system characteristics.

Key words: 

Tatar language, Turkic languages, informational technologies, corpus of Tatar, linguistic modeling, linguistic annotation in corpora, representativeness.

AttachmentSize
Статья251.03 KB