[求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

woshihuzi

初级会员
四个问题:
1、“语料库”这个概念在中国是谁首先提出(或者说引入)的?
2、中国最早的语料库是哪一个?
3、corpus在国外是随最先提出来的?
4、国外第一个语料库是哪一个?

谢谢!
 
The word Corpus has a Latin origin and has been an English word for centuries.
Some believe that "Corpus" has been used as a linguistics terminology since 1961, represented by the completion of earliest million-word scale computerized language dataset in Brown University (viz. Brown corpus) and in Europe (viz. the LOB corpus).

Collections of running texts before the year of 1961 were jokingly dubbed as corpora B.C. (before computer).
 
Can hardly find any clue to the initial mention of 语料库 in China. Prof. Yang Huizhong (with a Birmingham background) is no doubt a forerunner in corpus linguistics focusing on English language in China.

Scholars in Chinese (computational) linguistics, however, has been working on corpus linguistics for decades as well.

If we extend the concept of corpus linguistics to its field linguistics tradition, many scholars (Chao Yuen Ren for instance) can be included in these elite cohort.

As far as computerized language corpora are concerned, that's a latecomer in Chinese linguistics.
 
新文化运动前后,著名教育学家陈鹤琴为了教学的目的,在对语料统计的基础上(陈鹤琴根据6类材料55万汉字,历时两三年,选出了4261常用字),编写了《语体文应用字汇》,于1925年完成,于1928年由商务印书馆出版。

这算是中国最早的基于语料的研究了吧。
 
谢谢xujiajin先生,您的前两段英文我看明白了,不过,请您告诉我,它们的出处在哪里?我想读一下原文。谢谢哈。
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

Sampson, G. & D. McCarthy. (2006). Corpus Linguistics: Readings in a Widening Discipline. London: Continuum.

这个上面有提到。但我读到的好像是Kennedy(1998)上有,但刚翻了一下书,没找着。

事实上before computers这种说法在computer science领域早就有了。
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

Sampson, G. & D. McCarthy. (2006). Corpus Linguistics: Readings in a Widening Discipline. London: Continuum. 这本书哪里能下到啊?谢谢了
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

四个问题:
3、corpus在国外是随最先提出来的?
谢谢!

Geoff Leech昨天说他记得最早是Jan Aarts1982年先用这个提法的。

然后我就去图书馆找到了这本书。信息共享如下:

Aarts, Jan & Theo van den Heuvel. (1982). Grammars and Intuitions in Corpus Linguistics. In Stig Johansson (ed.). Computer Corpora in English Language Research (pp. 66-84). Bergen: Norwegian Computing Centre for the Humanities.

The very first sentences of the article go like:

"Corpus linguistics can only hope to play a significant role if it is capable of going beyond the stage of word counts and will be able to yield detailed information about - at least - the syntax of large corpora. It is not at all clear, however, what are the most efficient and linguistically interesting means to achieve this aim...."
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

谢谢许博士 这些信息对我也很有用
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

Very useful reference.

Geoff Leech昨天说他记得最早是Jan Aarts1982年先用这个提法的。

然后我就去图书馆找到了这本书。信息共享如下:

Aarts, Jan & Theo van den Heuvel. (1982). Grammars and Intuitions in Corpus Linguistics. In Stig Johansson (ed.). Computer Corpora in English Language Research (pp. 66-84). Bergen: Norwegian Computing Centre for the Humanities.

The very first sentences of the article go like:

"Corpus linguistics can only hope to play a significant role if it is capable of going beyond the stage of word counts and will be able to yield detailed information about - at least - the syntax of large corpora. It is not at all clear, however, what are the most efficient and linguistically interesting means to achieve this aim...."
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

very good.
thanks for the useful information.
 
回复: [求助]“语料库”这个概念在中国是哪位老先生率先提出来的?

四个问题:
1、“语料库”这个概念在中国是谁首先提出(或者说引入)的?

更有趣的来了。Leech转过来的Jan Aarts邮件里说Jan Aarts本人的答复是,1983年在荷兰Nijmegen大学召开的ICAME会议上他最早提到Corpus Linguistics。会议论文集出版于1984年。

这本书是:
Aarts, Jan and Willem Meijs (eds.). 1984. Corpus Linguistics: Recent Developments in the Use of Computer Corpora in English Language Research. Amsterdam: Rodopi.

有意思的是,他忘了自己曾在两年前(i.e. 1982)的另一篇文章里已用过该术语。

同时,这条信息也有助于回答楼主的第一个问题。因为在1984年的这本集子里有一篇复旦大学(?)的程雨民老师的文章,想来他该是中国最早听到“语料库语言学”这个术语的人了吧?我读研时认真读过程雨民老师的《英语语体学》一书,不大记得书中是否提到“语料库语言学”概念了。如果有的话,或许他的这本书要热卖了?!哈哈!
 
Back
顶部