PDA

查看完整版本 : 兰开斯特现代汉语语料库中文介绍


xujiajin
2005-06-15, 01:07 AM
兰开斯特现代汉语语料库中文介绍
http://forum.corpus4u.org/upload/forum/2005061501071124.doc

xujiajin
2005-06-15, 01:09 AM
1.0 前言

兰开斯特汉语语料库(The Lancaster Corpus of Mandarin Chinese,简称LCMC)是在Tony McEnery教授指导下,由他的学生肖忠华博士历时半年多于2003年6月初步建设完成的现代汉语平衡语料库。该语料库项目是由兰开斯特大学语言学系承担,由英国经社研究委员会资助设立的。LCMC语料库是严格按照Freiburg-LOB Corpus of British English(即FLOB)模式编制的汉语书面语语料库,它的建成有助于我们从事基于语料库的汉语单语或汉英(英汉)双语的对比研究。

2.0 LCMC语料库概况

LCMC是一个100万词次(按每1.6个汉字对应一个英文单词折算)的现代汉语书面语平衡语料库。起先建立时它是作为英国经社研究委员会资助项目Contrasting Tense and Aspect in English and Chinese的一部分。最初的设想便是要将其建成同FLOB和FROWN对等的现代汉语语料库。筹建这样的一个语料库的最初动因主要是:尽管已经有很多汉语语料库存在(Yang 2003),但却没有一个完全免费对公众开放的平衡的汉语语料库 。

xiaoz
2005-06-16, 12:07 AM
Thanks, Jiajin. Really appreciate your cooperation.

hancunxin
2005-06-26, 10:35 AM
哪儿有下,或用的地方呀?

xiaoz
2005-06-26, 09:13 PM
Try it online
http://www.ling.lancs.ac.uk/corplang/lcmc/lcmc/order.htm

hancunxin
2005-07-01, 12:51 PM
many thanks to xiaoz.

hancunxin
2005-07-01, 01:00 PM
i have tried it.

hancunxin
2005-07-07, 11:06 AM
xiao , i have downloaded 3 components of LCMC.(index,mannual, character) If i want to make a concordance or search a keyword, what shall i do next?

xiaoz
2005-07-07, 09:22 PM
See the new posting for how to Xaira with LCMC?

清风出袖
2005-07-20, 01:10 PM
thanks a lot! why can't the corpus made at home be released for wider readership?

xiaoz
2005-07-20, 03:39 PM
Cannot agree more.

hancunxin
2005-07-20, 03:50 PM
yes, i agree.

xujiajin
2005-07-20, 10:06 PM
Richard, but i have a question. how do u handle the problem of copyright if you make the texts public available?

xiaoz
2005-07-20, 10:14 PM
Write to the copyright holders to ask for permissions and let them know that speaker identities (for spoken data) are made anonymous and that the data is used for academic research, not for any commercial purpose (or if you do make money, you will enter into a profit-sharing agreement). Most copyright holders I have ocntacted are very cooperative.

动态语法
2005-07-21, 11:57 PM
So here is the situation: If we want to use the indexed version of LCMC,
we need to stick to version 1.13 of Xaira. But for other corpora it'd be
preferable to use Xaira 1.14 given that it is the latest version?

xiaoz
2005-07-22, 12:35 AM
If users know how to index a corpus, you can use the latest release of Xaira. But note the Collocation link is broken in 1.14. We will release 1.15 in a couple of days.