"Brown" corpus of Bulgarian 保加利亚语布朗家族语料库

xujiajin

管理员
Staff member
"Brown" corpus of Bulgarian 保加利亚语布朗家族语料库

http://dcl.bas.bg/Corpus/copyright_en.html

Features of the Bulgarian corpus.

Each corpus sample (corpus unit, text sample) is an excerpt(s) from a text (texts) which length is fixed at 2 000 words with the precise number of words varying, as the adopted methodology envisages keeping sentence boundaries. The term 'corpus sample' and its synonyms are used to refer to that part of any textual matter included in the corpus. The "Brown" Corpus of Bulgarian consists of 500 corpus samples and totals to 1 001 286 words. Despite the intention to make samples 2 000+ words, 136 samples contain less than 2 000 words.
 
Back
顶部