Apache OpenNLP

http://incubator.apache.org/opennlp/index.html

Welcome to Apache OpenNLP

The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning.
 
回复: Apache OpenNLP

OpenNLP也可以用于R (需要Java支持)

下面是我在Ubuntu下的测试结果:

1. R> install.packages(c("rJava") #如果出现 package ‘rJava’ 不能安装的情况,sudo apt-get install openjdk-6-jdk

2. R> install.packages(c("openNLP","openNLPmodels.en"))

3. 假设有nlp.txt 文件存在:

I would like to test the new package RGG (R Gui Generator). This package requires the installation of several other package.
One of them is rJava.
I installed sun-java6-jdk and run the R CMD javareconf but the installation still fails !
Do you have any ideas ?


4. R> txt <- readLines(file.choose()) # 选择nlp.txt

5. R> library(openNLP)

6. R> txt2 <- sentDetect(txt, language = "en") # 断句
R> txt2
[1] " I would like to test the new package RGG (R Gui Generator). "
[2] "This package requires the installation of several other package. "
[3] "One of them is rJava. "
[4] "I installed sun-java6-jdk and run the R CMD javareconf but the installation still fails ! "
[5] "Do you have any ideas ? "


7. R> length(txt2)
[1] 5

8. R> txt3 <- tagPOS(txt2, language = "en")
R> txt3
[1] "I/PRP would/MD like/VB to/TO test/VB the/DT new/JJ package/NN RGG/NNP (R/NNP Gui/NNP Generator)./."
[2] "This/DT package/NN requires/VBZ the/DT installation/NN of/IN several/JJ other/JJ package./NN"
[3] "One/CD of/IN them/PRP is/VBZ rJava./VBG"
[4] "I/PRP installed/VBD sun-java6-jdk/JJ and/CC run/VB the/DT R/NN CMD/NN javareconf/NN but/CC the/DT installation/NN still/RB fails/VBZ !/."
[5] "Do/VBP you/PRP have/VBP any/DT ideas/NNS ?/."
 
Back
顶部