New online search interface for MICASE

laohong

管理员
Staff member
#1
New online search interface for MICASE

Dear All,

The Michigan Corpus of Academic Spoken English (MICASE) is a
1.8-million-word corpus of transcripts from a range of different speech
events recorded at the University of Michigan. Since May 2002, the
English Language Institute at the University of Michigan has made the
corpus available online through a search interface that some of you may
be familiar with already. We are now launching a new online search
interface due to issues of technical obsolescence.

New MICASE online has some improved functionality. For example, the new
search interface provides simple descriptive statistics for each search
term, and the search results can be downloaded to the user's own
computer. The downloadable information includes metadata (such as the
name of the transcript, the speaker's academic role and gender) in a
tab-delimited format that can be further examined using a program like
Microsoft Excel.

To explore MICASE online, simply go to http://micase.umdl.umich.edu
(this is the same URL as before).

We have also updated the MICASE Manual, which is available on our
website at http://www.lsa.umich.edu/eli/micase/index.htm (under "About
MICASE"). The chapter on MICASE online has been updated, with the
changes described in more detail.

I should also mention that, for users who wish to access the corpus data
through other means, the corpus transcripts are available in XML format
for a small fee. The sound files are also available for purchase
(although some of them are freely accessible online for listening at
http://www.lsa.umich.edu/eli/micase/Audio/index.htm).

Happy searching!
Annelie

Dr. Annelie ?del
MICASE/MICUSP director
English Language Institute
University of Michigan
 

laohong

管理员
Staff member
#2

laohong

管理员
Staff member
#4
回复: New online search interface for MICASE

MICASE的152个语篇以前是可以打包下载的,现在要收钱了。不过,从搜寻结果中还是可以下载到搜寻结果相应的XML格式的。要批量下载全部语料,耍点小聪明即成。
 

laohong

管理员
Staff member
#5
回复: New online search interface for MICASE

作者 清风出袖:
最最亲爱的老洪博士:请问怎么才能把所有语料down下来??? 谢谢啦,哈哈,什么时候来南京,我请你吃饭,谢谢。盼复!!!:D
^_^,先卖个关子,请问还有谁把那152个XML语篇下载下来了?清风请吃饭听者有一份呀!
 

armstrong

高级会员
#6
回复: New online search interface for MICASE

这样的话,我也请老洪吃饭了.老洪来云南旅游我来作向导.
 
#7
回复: New online search interface for MICASE

哈哈:D :D 老洪同志差不多了吧???!!!两位都等着你现身传道呢:D :D
 

laohong

管理员
Staff member
#8
回复: New online search interface for MICASE

好,既然饭局敲定了,关子也该打开了。 这里我的做法:

1、打开Browse页面,可以看到152Transcripts的列表:
http://quod.lib.umich.edu/cgi/c/corpus/corpus?c=micase&cc=micase&type=browse

2、敲开第一个Transcript ADV700JU023,打开了一个新页面,在页面里可以看到“Download entire transcript in XML”,鼠标右键敲它,选择“Copy Shortcut”得到如下下载路径:
http://quod.lib.umich.edu/cgi/c/cor...case;action=downloadtranscript;id=ADV700JU023

3、回到Transcripts列表页面,再任意敲一个Transcript,同样可以得到其下载地址。(如果你只看到这里就去下载,至少得搞152下,很累人是不是?)

4、从2、3的结果可以看到一个规律,那就是XML Transcripts的下载地址其实就是在下面的URL等号后加上tranascript的文件名即可:
http://quod.lib.umich.edu/cgi/c/corpus/corpus?c=micase;cc=micase;action=downloadtranscript;id=

5、这样只要把Trascript列表页面存成html格式,用EditPlus打开,Search, Replace, 在Find What 里填入view=transcript,在Replace With里填入action=downloadtranscript。然后存下即可。

6、使用下载工具,如,迅雷、音影传送带等进行批量下载。这里以迅雷为例,打开那个修改好的html文件,鼠标右键敲html页面,选择“使用迅雷下载所有链接”,这样就可以在3分钟内得到所有152个XML transcripts了。

7、同理可以搞定声音文件。
 
#9
回复: New online search interface for MICASE

学了一手,谢谢老洪同志,等你来南京请你啊,决不食言,哈哈!!!
 
#10
回复: New online search interface for MICASE

搞定了,谢谢老洪,到云南来只要告诉我一声,一切我包了,哈!哈!哈!哈!
 

chrisyang

普通会员
#11
回复: New online search interface for MICASE

老洪的妙招真宛如滔滔之水,一发而不可收拾啊!!向您学习啦!
 
#12
回复: New online search interface for MICASE

谢谢师父嗳:)

该下载的我全都下载了,一个不漏,包括语音文件.

预祝师父云南、南京旅游愉快!我要送师父大礼包.
 

laohong

管理员
Staff member
#13
回复: New online search interface for MICASE

嘘,小声点!让MICASE的人听到不太好,怪我把人教坏了。
 
#16
回复: New online search interface for MICASE

好像用老洪同志给的方法直接来下载音频不成啊,是不是应该来改一下view=后面的内容呢?用editplus搜索不到transcript啊,请哪位给解答一下? 谢谢了。
 
#17
回复: New online search interface for MICASE

求各位老师帮忙,我是一名语料库初学者,请问在哪里能找到MICASE和BASE的录音下载?不胜感激!
 
顶部