[求助]如何利用软件计算词汇密度

jackie

初级会员
各位高手,我想计算语料库中文本的词汇密度,采用的方法是:实词token/full token*100。现在的问题是语料库的文本太多,如果人工处理的话肯定会被累死,而且准确率不高:(
曾经用语料库工具里的方法做过尝试,可惜水平太差,徒劳无功,所以恳请高手们赐教该如何解决?万分感谢!!
 
回复:[求助]如何利用软件计算词汇密度

The short story is this: you need to first of all define what you mean by "实词token".
Once you get this you can have your corpus POS tagged, and on the basis of your POS
tagged corpus you can calculate your "实词tokens" by searching the POS tags that
you think belong to the "实词tokens". And then you can calculate 词汇密度 based on
your formula.

There was a discussion about this a while ago...I think it's here:

http://www.corpus4u.com/forum_view.asp?forum_id=38&view_id=942&page=1

以下是引用 jackie2006-1-3 23:38:18 的发言:
各位高手,我想计算语料库中文本的词汇密度,采用的方法是:实词token/full token*100。现在的问题是语料库的文本太多,如果人工处理的话肯定会被累死,而且准确率不高:(
曾经用语料库工具里的方法做过尝试,可惜水平太差,徒劳无功,所以恳请高手们赐教该如何饩觯客蚍指行唬。?
 
Thank you, 动态语法:)
I have consulted the link you suggested and find it is helpful. I will try out the suggestions
 
Back
顶部