Future Directions in Corpus Linguistics

laohong · 2006-03-07

Future Directions in Corpus Linguistics

- Tony McEnery/ Patricia Shaw

Video Summary

The following summary provides an overview of the discussion at the CASTA 2005 Forum on Corpus Linguistics, October 3rd, 2005. Video clips of the event are linked throughout the text. Where possible we have also provided links to PowerPoint materials and other references.

Introduction

Tony McEnery (Lancaster University) was introduced by John Newman, Chair of Linguistics at the University of Alberta. Tony McEnery has been working as a corpus linguist for many years. He is known for his Mandarin Chinese project, the results of which are available online. His most recent published book is on corpus-based language studies.

Lecture

McEnery started his talk by emphasizing the fact that the discussion about corpus linguistics would have been on a different scale 20 years ago. He gave numerous examples of earlier corpus linguistics experiences in the first section of his lecture, called "Yesterday". The collection of data was more difficult than it is now; by the mid-eighties a one-million-word corpus (video clip 1) was a great accomplishment.

The data at that time was used for simple purposes; there was no standardization of corpus markup schemes. The accessibility level of the data was low. Corpus linguists tools were primitive and they were created in various programming languages, functioning on different platforms. Automated annotation was available only for English. The usage of these tools was also limited to only describing the basic grammatical structure of the language. The results were less impressive for their research insight as they were as evidence that this kind of work could be done at all (video clip 2).

Finding data was extremely difficult. One couldn't easily get an electronic representation of textual data. The work involved manual access to the data, or one had to retrieve whatever they could get their hands on. Back in those times, there were only a few corpus linguists and they all knew each other. Corpus linguistics had limited dialog with mainstream linguistics (video clip 3) . The absence of standards (video clip 4) was a further limiting factor.

Tony McEnery also talked about difficulties of becoming a corpus linguist and how much of a demand it required from one in terms of professional background and training (video clip 5).

In the second part of his lecture, called "Yesterday's Tomorrow", McEnery discussed the problems that corpus linguistics thought were to be solved, and how it turned out quite differently as it went on solving them. He gave examples from his own work of how data and theory interact, and how linguistic theories must address the substantial evidence of real language use which corpora represent (video clip 6).

Corpus annotation is still not very commonly accessible and there are only a few languages for which corpus data is available. English language corpus building might present a good model for other languages (video clip 7).

Even carefully collected corpora have the danger of embedding artificial language situations because the written texts are often passed through a process of selection and editing (video clip 8).

In his third section, called "Today", McEnery outlined some of the persistent problems in today's corpus linguistic world. On a positive note, data of many world languages are now more accessible. Some problems have disappeared on their own, while some new ones have been introduced by computer software (video clip 9).

Finally, under the heading of "Tomorrow", McEnery suggested that persistent corpus problems should be dealt with, and that software development must accelerate. With the arrival of XML & Unicode, the payoff of encoded corpus language data is more clear than ever. Finding a standardized format for working with all the languages of the world should be a research priority. And linguistic theory which is not corpus-based or corpus-referenced should be challenged constantly (video clip 10).

Response

Patricia Shaw, from the University of British Colombia, acted as our invited respondent. She shared her professional experience (video clip 11) with the world of corpus linguistics and talked about how the field is vital to the efforts of studying many spoken First Nations dialects for which we no longer have native speakers (video clip 12).

As a representative of the First Nations, Shaw believes that the documentation of these unique languages is extremely important, and must be based in the communities where the language is used (video clip 13).

Shaw ended by emphasizing that corpora and corpus-linguistic approaches are also essential to teaching the language (video clip 14).

NOTE:

1. The viddeo clips will be uploaded to corpus4u's gmail account later;

2. The PowerPoint presentation used in conjunction with this talk is available here:

http://forum.corpus4u.org/upload/forum/2006030714340942.pdf

laohong · 2006-03-07

The video clips were uploaded to the Gmail account a few minutes ago. BTW, the video clipls are in mov format (playable with QuikeTime Player).

Here is the link to the original post: http://tapor.ualberta.ca/CASTA2005/forumsummary/monday/index.html

In case the link may not work, pls download the files from Corpus4U Gmail account.

刘语料 · 2006-03-07

thanks laohong.

oscar3 · 2006-03-08

回复：Future Directions in Corpus Linguistics

What tool shall I use to put those video clips together as one single file?
Thanks.

Haiyang Ai · 2006-03-08

You don't have to put them together to be able to view them I think.

Thanks laohong for sharing.

It's a pleasure to watch the video by VIP corpus linguist Tony McEnergy

oscar3 · 2006-03-08

回复：Future Directions in Corpus Linguistics

Thank you. I have just installed Quiktime in my computer.

laohong · 2006-03-08

Movie Joiner 3.35 is a good tool if you want to join several clips into a big one.

http://www.freedownloadscenter.com/Multimedia_and_Graphics/Video_and_Animation_Tools/Movie_Joiner.html

xujiajin · 2006-03-27

Laohong, thank you so much for sharing this with us.

jackzch · 2006-03-28

That's a very useful resource. THANKS!

jackzch · 2006-03-29

回复：Future Directions in Corpus Linguistics

以下是引用 laohong 在 2006-3-7 14:59:34 的发言：
The video clips were uploaded to the Gmail account a few minutes ago. BTW, the video clipls are in mov format (playable with QuikeTime Player).

Sorry, i can't download the videos. please show me how and where i can find the Gmail account you refer to. Actually, I don't know much about gmail.

Haiyang Ai · 2006-03-29

Oscar3 has kindly set up a 2GB gmail account for corpus4u users.
You can upload and get large-sized ebooks, software there.
http://gmail.google.com
ID: corpus4u
Pass: www.corpus4u.com

corpusmeilin · 2008-01-10

回复: Future Directions in Corpus Linguistics

Very helpful~

Future Directions in Corpus Linguistics

laohong

管理员

laohong

管理员

刘语料

封禁用户

oscar3

高级会员

Haiyang Ai

Administrator

oscar3

高级会员

laohong

管理员

xujiajin

管理员

jackzch

普通会员

jackzch

普通会员

Haiyang Ai

Administrator

corpusmeilin