Agile corpus creation

Haiyang Ai

Holger Voormann,
Ulrike Gut
Citation Information. Corpus Linguistics and Lingustic Theory. Volume 4, Issue 2, Pages 235–251, ISSN (Online) 1613-7035, ISSN (Print) 1613-7027, /November/2008
Published Online: 09/12/2008

In the past decades language corpora have become indispensable tools for linguistic research and the development of linguistic theory. However, it is not yet widely acknowledged that the quality of corpus-based research and theories depends crucially on the quality of the corpora, not only in terms of their content and size but especially as far as the accuracy and richness of the annotations are concerned. Neither has much systematic thought gone into the effectiveness of the traditional corpus creation process regarding this problem. This paper proposes a novel approach to corpus creation – agile corpus creation – that addresses the problem of simultaneously maximizing corpus size as well as the quality and quantity of manual and automatic annotations while minimizing the time and cost involved in corpus creation. The central aspects of agile corpus creation lie in the reorganization of the traditional linear and separate phases of corpus design, data collection, data annotation and corpus analysis and in the recognition of potential sources of errors during corpus creation.

回复: Agile corpus creation

语料库的建设不是一件容易的事情,往往需要耗费大量的人力物力财力,这篇文章提出了不错的方法,怎么才能 “敏捷” 地建设语料库呢?有没有C友有这篇文章?给大家分享一下。