[求助]MWEs(多词表达)的界定

jinshan_wu

普通会员
对英语MWEs(多词表达)已经有了比较清楚的界定,而对汉语多词表达,不知有没有一个成熟的定义?
 
Timothy Baldwin对英语多词表达的定义为1. decomposable into multiple simplex words
2. lexically, syntactically, semantically, pragmatically and/or statistically idiosyncratic
这一定义本身也是含糊的,把它推广到汉语,更是有很多问题,比如:“工作单位”应该被认为是一个二词组合还是一个多词表达?
 
个人观点:
“工作单位”应该是二词组合。
MWE我想应该可以从两个角度去看,一个是从语义、功能的完整性去看;另一个是从概率上去看。
当然在筛选的过程中,应当两者应当加以结合。
 
回复:[求助]MWEs(多词表达)的界定

以下是引用 jinshan_wu2006-5-28 22:49:09 的发言:
Timothy Baldwin对英语多词表达的定义为1. decomposable into multiple simplex words
2. lexically, syntactically, semantically, pragmatically and/or statistically idiosyncratic
这一定义本身也是含糊的,把它推广到汉语,更是有很多问题,比如:“工作单位”应该被认为是一个二词组合还是一个多词表达?
为什么定义"含糊"也要 推广到汉语?难道汉语语言学界也宁"不化"也愿"食洋"?
 
回复:[求助]MWEs(多词表达)的界定

以下是引用 xujiajin2006-5-29 9:46:07 的发言:
个人观点:
“工作单位”应该是二词组合。
MWE我想应该可以从两个角度去看,一个是从语义、功能的完整性去看;另一个是从概率上去看。
当然在筛选的过程中,应当两者应当加以结合。

敢问版主该如何结合,静听高见!谢谢!
 
我的做法分两步:
1、先借助词频得出ngrams
2、然后剔除those nonsense noise.
 
正是因为上面的定义不适合于汉语,所以才要对其进行修正使其能更好的对汉语的多词表达进行概括。使我们研究中的英语和汉语多词表达有一个共同的所指。 我认为多词表达应该包括熟语和常用语。 熟语指的是哪些语言中定型的词组或句子, 不能任意改变和替换其组织。它在语义上是不可分割的。它包括成语,谚语,歇后语,专门语,缩略语等。 常用语指的是哪些在语义上可以分割,但在使用中出现频率较高的搭配组合。他的内部成分具有可替换性。
 
Dr Xu 的方法对常用语的提取是可行的。但对噪音信息的剔除将会是一个难题。 我们必须设计一个句法过滤器,使它能够自动过滤掉在句法上不符合要求的词簇。 另外, 对汉语N词词簇的提取似乎也是一个问题,不知哪一款软件能很好的实现这一功能?wordsmith, MLCT, Paraconc好像都无能为力
 
最近在corpora mailing list 上有个相关的讨论,转贴在这里,希望对感兴趣的朋友有所帮助。

问:

> As a part of my M.A. thesis, I am investigating multi-word units
> (otherways called recurrent sequences, word bundles, etc.) in non-native
> English speech.
>
> I remember having read somewhere that if such a string of words contains
> repeats (of the same words) and/or hesitation (like erm, mm, uhu) within
> it, it cannot be treated as a formula, for the very inclusion of
> repetition or hesitation proves the sequence was not processed as a
> whole but rather as a set of words
.
>
> Unfortunately, I do not remember from which scolar the idea comes and I
> need it for justification of my very study.
>
> I'll appreciate any hints or direct reference to the source.
>
> Best regards,
>
> Joanna Jendryczka
>
> M.A. English Linguistics student
> A.Mickiewicz University Poznan, Poland
>


答:

Dear Joanna,

I don't know the source of the suggestion, but I don't think you should
assume that a sequence which would be treated as formulaic without a
hesitation in the middle of it is necessarily non-formulaic because of the
hesitation.
A lot, of course, depends on how you define formulaic sequences.
You might be interested in the following references which contain material
on pausing in relation to formulaic strings
:

Erman, B. (forthcoming) Pauses as evidence of cognitive effort in
prefabricated and non-prefabricated structures. International Journal of
Corpus Linguistics.

Raupach, M. (1984) Formulae in second language speech production. In H. W.
Dechert, D. Möhle and M. Raupach (eds.) Second Language Production.
Tübingen: Gunter Narr, 114-137.

Wray, A. (2004) 'Here's one I prepared earlier': formulaic language learning
on television. In N. Schmitt (ed.) Formulaic Sequences: Acquisition,
Processing and Use. (Language Learning and Language Teaching 9). Amsterdam
and Philadelphia: John Benjamins, 249-268.


Best wishes,

Chris Butler
Honorary Professor, Centre for Applied Language Studies, University of Wales
Swansea, UK
 
Back
顶部