基于中文詞匯word naming和lexical decision的實驗數據,与现存幾個词频表的詞頻进行了比较,显示這些詞頻对RT的解释作用最优。

Cai, Q. & Brysbaert, M (in press). SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles. PLoS ONE.
暂时还没 online, 所以把final version的manuscript放在附件里了。abstract也贴在下面了。

词频为文本格式。比较容易的使用办法是 下载存盘- 打开excel - 从excel里打开 - Original data type 选Delimited(上面一个); File Origin选中文简体。使用其他program调用,请参考paper里的figures,有介绍格式。

Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency measures available to researchers, and the quality is less than what researchers in other languages are used to.
Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million characters, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts.
Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.


Last edited:
回复: SUBTLEX-CH:中文字頻/詞頻/標注詞頻(PoS)

I've been searching for a long time for the word frequency, Thank you a lot!

---the idea of subtitles is brilliant!
回复: SUBTLEX-CH:中文字頻/詞頻/標注詞頻(PoS)