回复: 请教:类连接的卡方需要哪些数值来计算?
不好意思, 我的能力只能提供您 WordSmith 5.0 說明書裡關於 keyness 的計算方式.
以下出自 Mike Scott.
.............................................................
How Key Words are Calculated
The "key words" are calculated by comparing the frequency of each word in the word-list of the text you're interested in with the frequency of the same word in the reference word-list. All words which appear in the smaller list are considered, unless they are in a stop list.
If the occurs say, 5% of the time in the small word-list and 6% of the time in the reference corpus, it will not turn out to be "key", though it may well be the most frequent word. If the text concerns the anatomy of spiders, it may well turn out that the names of the researchers, and the items spider, leg, eight, etc. may be more frequent than they would otherwise be in your reference corpus (unless your reference corpus only concerns spiders!)
To compute the "key-ness" of an item, the program therefore computes
its frequency in the small word-list
the number of running words in the small word-list
its frequency in the reference corpus
the number of running words in the reference corpus
and cross-tabulates these.
Statistical tests include:
the classic chi-square test of significance with Yates correction for a 2 X 2 table
Ted Dunning's Log Likelihood test, which gives a better estimate of keyness, especially when contrasting long texts or a whole genre against your reference corpus.