查看完整版本 : [求助]跪求stemming提取词根的程序!!!
nijieqiong
2006-08-05, 08:12 PM
目前急需找一个stemming的现成程序,stemming是用来提取词根的,因为在英文中同一个词有各种不同的形式,要统一成一个概念,也就是用一个词根。希望各位大侠帮忙!
xiaoz
2006-08-05, 08:32 PM
Porter Stemmer:
http://search.cpan.org/src/ULPFR/perlindex-1.502/lib/Text/English.pm
Stemmer in Snowball (Porter2):
http://www.snowball.tartarus.org/algorithms/english/stemmer.html
nijieqiong
2006-08-05, 08:50 PM
非常感谢,请问你发的第二个地址,是源代码吗?是什么语言的?我怎么一点也看不懂啊
xiaoz
2006-08-05, 09:27 PM
The language is Snowball, see
http://www.snowball.tartarus.org/
Snowball is a small string-handling language, and its name was chosen as a tribute to SNOBOL (Farber 1964, Griswold 1968 ― see the references at the end of the introduction), with which it shares the concept of string patterns delivering signals that are used to control the flow of the program. See the Snowball manual on thais page.
xujiajin
2006-08-05, 09:30 PM
http://www.lexically.net/downloads/version4/e_lemma.zip
You may also be interested in an English lemma list made in 1998 by Yasumasa Someya which "currently contains 40,569 words (tokens) in 14,762 lemma groups. It is still far from complete, but I hope you find the list useful in preparing your own more complete lemma list. If you have any questions or comments about this lemma list, feel free to contact me (ysomeya@gol.com)"
nijieqiong
2006-08-06, 02:32 PM
非常感谢大家,我已经找到了下载的程序
hancunxin
2006-08-07, 07:35 PM
以下是引用 nijieqiong 在 2006-8-6 14:32:36 的发言:
非常感谢大家,我已经找到了下载的程序
能共享一下你下载的软件吗?
for_happy
2006-08-21, 08:44 PM
对啊 能否共享一下啊
xujiajin
2006-08-21, 09:53 PM
http://www.lexically.net/downloads/version4/e_lemma.zip
就是一个词表。然后在WordSmith settings中将这个词表load进去就可以了。
vBulletin® v3.7.4,版权所有 ©2000-2009,Jelsoft Enterprises Ltd.