PDA

查看完整版本 : 如何得到参照语料库中的原语料


zyhope
2007-06-01, 09:29 PM
我想找一个口语参照语料库,但是一直提取不到完整的texts,网上的可以进行在线查询,但似乎只能是查词频,搭配什么的,好象没法和观察语料库进行对比,卡方检验等。各位老师,怎么才能找到那些口语参照库中的原文呢?
还有一个问题就是如果要找比如话轮转换处的Yeah, yes,absolutely等,怎么查才能刚刚好得到的是话轮转换处的,而不是其它话语处的absolutely.
提的问题可能是太低级了,可是我真的遇到困难了,还请老师们告知一二,谢谢!

xiaoz
2007-06-01, 10:22 PM
The Santa Barbara Corpus of Spoken American English (SBCSAE) is good resouce for your purpose. You can download the corpus below (digital audio files also included):
http://talkbank.org/data/Conversation/SBCSAE.zip

The recent paper by Prof Tao on "Absolutely" is also a good reference for your project:
http://eng.sagepub.com/cgi/reprint/35/1/5

armstrong
2007-06-01, 10:27 PM
Thanks a lot, Richard.

zyhope
2007-06-01, 10:30 PM
Really thank you for your help.
I will have a try.

corpora
2007-06-01, 11:42 PM
wordbank打开之后又很多怪字符,是否是有什么问题?

zyhope
2007-06-01, 11:47 PM
是呀,下下来的文件是cha或xml格式的,里面标记也很复杂,应该怎么用呢?

xiaoz
2007-06-01, 11:54 PM
This version of SBCSAE applies CHAT annotation format for use with CLAN, which can be downloaded from the CHILDES or TalkBank sites:
http://talkbank.org/software.html

You should of couse get familiar with the symbols used for encoding speech and prosodic features used for SBCSAE, developed by Dubois et al - I think there is a copy at this site, just search for it.

You can of course download a tool from the above link to convert the CHAT format to XML and then explores the XML version using Xaira.

You can also remove all annotations and leave transcripts of the spoken data if you can do a little bit of programming.

xujiajin
2007-08-08, 08:52 PM
http://www.linguistics.ucsb.edu/faculty/bucholtz/transcription/materials.html