[转帖]Types of corpora

corpus4u

初级会员
There are many different kinds of corpora. They can contain written or spoken (transcribed) language, modern or old texts, texts from one language or several languages. The texts can be whole books, newspapers, journals, speeches etc, or consist of extracts of varying length. The kind of texts included and the combination of different texts vary between different corpora and corpus types.

'General corpora' consist of general texts, texts that do not belong to a single text type, subject field, or register. An example of a general corpus is the British National Corpus. Some corpora contain texts that are sampled (chosen from) a particular variety of a language, for example, from a particular dialect or from a particular subject area. These corpora are sometimes called 'Sublanguage Corpora'.

Corpora can consist of texts in one language (or language variety) only or of texts in more than one language. If the texts are the same in all languages, e.i. translations, the corpus is called a Parallel Corpus. A Comparable Corpus is a collection of "similar" text


http://www.essex.ac.uk/linguistics/clmt/w3c/corpus_ling/content/introduction2.html
 
Back
顶部