English compound finding tool Perl script

xujiajin

管理员
Staff member
I wrote a Perl program, find-compounds.pl, to find the longest compound words of the text.
It is part of the Text-NSP package. The following link is the description.

http://search.cpan.org/~tpederse/Text-NSP-1.21/bin/utils/find-compounds.pl


The original text contains "This is the new york city". In the compound word list, it has

new_york
new_york_city

The find-compounds.pl will find the longest match. After replace the compound words, the text is "This is the new_york_city".


This code needs to input an offline ready list of the compound words you are interested in.
The output is the text file with compound words replaced. In order to pick out the sentences
which contain the compound words, you need to further process the output text. Hope this helpful.

Thanks,
Ying


Quote from Corpora List
 
Back
顶部