查看完整版本 : Stanford Parser online
xujiajin
2007-03-16, 08:56 PM
http://josie.stanford.edu:8080/parser/
laohong
2007-03-16, 09:37 PM
To follow up Dr. Xu's recommendation, here is a list of corpus tools developed by Standford NLP Group:
The Stanford Parser
Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.
Online parser demo at: http://josie.stanford.edu:8080/parser/
Download the full package (requires Java 5 or JDK1.5 to run) at: http://nlp.stanford.edu/downloads/StanfordParser-2006-06-11.tar.gz
The Stanford POS Tagger
A Java implementation of a maximum-entropy part-of-speech (POS) tagger
Download Stanford Tagger version 2006-05-21 (requires JDK 1.5.0 or above to run) at: http://nlp.stanford.edu/software/postagger-2006-05-21.tar.gz
The Stanford Named Entity Recognizer
A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition.
Download Stanford Named Entity Recognizer version 1.0
http://nlp.stanford.edu/software/stanford-ner-2007-01-29.tar.gz
Stanford Chinese Word Segmenter
A Java implementation of a CRF-based Chinese Word Segmenter
Download Stanford Chinese Segmenter version 2006-05-11 (requires JDK 1.5.0 or above to run) at: http://nlp.stanford.edu/software/StanfordChineseSegmenter-2006-05-11.tar.gz
The Stanford Classifier
A Java implementation of conditional loglinear model classification (a.k.a. maximum entropy or multiclass logistic regression models)
Download Stanford Classifier version 1.0
http://nlp.stanford.edu/software/StanfordClassifier.tar.gz
Tregex and Tsurgeon
A Java implementation of a Tgrep2-style utility for matching patterns in trees, and a tree-transformation utility built on top of this matching language.
Download Tregex version 1.2 at: vhttp://nlp.stanford.edu/software/tregex.tar.gz
For more information about these tools, visit http://nlp.stanford.edu/software/index.shtml
laohong
2007-03-16, 10:14 PM
这个Parser包还不错,中英文都能处理,就是结果没法存下来。不知道许博士试过没有。
armstrong
2007-03-16, 11:29 PM
when i load the paser,it shows that"Could not load the paser. Out of memory"
why?
the tagger can not be used, either.
I have installed JDk.
laohong
2007-03-17, 01:15 PM
终于可以直接把Parser的结果输出到文本文件了。刚才测试了一下批处理下面四个句子(卫教授写给Sinclair悼唁里的前四句):
We are shocked to hear that Professor John Sinclair has left us. Undoubtedly, the 13th of March 2007 is a saddest day to the world linguistics, Corpus Linguistics in particular. The gap left by the departure of this innovative thinker and distinguished linguist will be felt in the hearts of the researchers working along the lines he has set. In deepest sorrow, we, linguists at Shanghai Jiao Tong University, China, found that we cannot express with words our gratitude and respect to John.
得到结果如下:
Parsed 94 words in 4 sentences (13.73 wds/sec; 0.58 sents/sec).
也跟大家分享详细的每句树型结果:
We are shocked to hear that Professor John Sinclair has left us.
(ROOT
(S
(NP (PRP We))
(VP (VBP are)
(ADJP (JJ shocked)
(S
(VP (TO to)
(VP (VB hear)
(SBAR (IN that)
(S
(NP (NNP Professor) (NNP John) (NNP Sinclair))
(VP (VBZ has)
(VP (VBN left)
(NP (PRP us)))))))))))
(. .)))
nsubj(shocked-3, We-1)
cop(shocked-3, are-2)
aux(hear-5, to-4)
xcomp(shocked-3, hear-5)
complm(left-11, that-6)
nn(Sinclair-9, Professor-7)
nn(Sinclair-9, John-8)
nsubj(left-11, Sinclair-9)
aux(left-11, has-10)
ccomp(hear-5, left-11)
dobj(left-11, us-12)
Undoubtedly, the 13th of March 2007 is a saddest day to the world linguistics, Corpus Linguistics in particular.
(ROOT
(S
(ADVP (RB Undoubtedly))
(, ,)
(NP
(NP (DT the) (NN 13th))
(PP (IN of)
(NP (NNP March) (CD 2007))))
(VP (VBZ is)
(NP
(NP (DT a) (JJ saddest) (NN day))
(PP (TO to)
(NP
(NP (DT the) (NN world) (NNS linguistics))
(, ,)
(NP
(NP (NNP Corpus) (NNP Linguistics))
(PP (IN in)
(NP (NN particular))))))))
(. .)))
advmod(day-11, Undoubtedly-1)
det(13th-4, the-3)
nsubj(day-11, 13th-4)
prep_of(13th-4, March-6)
num(March-6, 2007-7)
cop(day-11, is-8)
det(day-11, a-9)
amod(day-11, saddest-10)
det(linguistics-15, the-13)
nn(linguistics-15, world-14)
prep_to(day-11, linguistics-15)
nn(Linguistics-18, Corpus-17)
appos(linguistics-15, Linguistics-18)
prep_in(Linguistics-18, particular-20)
The gap left by the departure of this innovative thinker and distinguished linguist will be felt in the hearts of the researchers working along the lines he has set.
(ROOT
(S
(NP
(NP
(NP (DT The) (NN gap))
(VP (VBN left)
(PP (IN by)
(NP
(NP (DT the) (NN departure))
(PP (IN of)
(NP (DT this) (JJ innovative) (NN thinker)))))))
(CC and)
(NP (VBN distinguished) (NN linguist)))
(VP (MD will)
(VP (VB be)
(VP (VBN felt)
(PP (IN in)
(NP
(NP (DT the) (NNS hearts))
(PP (IN of)
(NP (DT the) (NNS researchers)))))
(S
(VP (VBG working)
(PRT (RP along))
(NP
(NP (DT the) (NNS lines))
(SBAR
(S
(NP (PRP he))
(VP (VBZ has)
(VP (VBN set)))))))))))
(. .)))
det(gap-2, The-1)
nsubjpass(felt-16, gap-2)
partmod(gap-2, left-3)
det(departure-6, the-5)
prep_by(left-3, departure-6)
det(thinker-10, this-8)
amod(thinker-10, innovative-9)
prep_of(departure-6, thinker-10)
amod(linguist-13, distinguished-12)
conj_and(gap-2, linguist-13)
aux(felt-16, will-14)
auxpass(felt-16, be-15)
det(hearts-19, the-18)
prep_in(felt-16, hearts-19)
det(researchers-22, the-21)
prep_of(hearts-19, researchers-22)
partmod(felt-16, working-23)
prt(working-23, along-24)
det(lines-26, the-25)
dobj(working-23, lines-26)
nsubj(set-29, he-27)
aux(set-29, has-28)
rcmod(lines-26, set-29)
In deepest sorrow, we, linguists at Shanghai Jiao Tong University, China, found that we cannot express with words our gratitude and respect to John.
(ROOT
(S
(PP (IN In)
(NP (JJS deepest) (NN sorrow)))
(, ,)
(NP
(NP (PRP we))
(, ,)
(NP
(NP (NNS linguists))
(PP (IN at)
(NP
(NP (NNP Shanghai) (NNP Jiao) (NNP Tong) (NNP University))
(, ,)
(NP (NNP China)))))
(, ,))
(VP (VBD found)
(SBAR (IN that)
(S
(NP (PRP we))
(VP (MD can) (RB not)
(VP (VB express)
(PP (IN with)
(NP (NNS words)))
(NP
(NP (PRP$ our) (NN gratitude)
(CC and)
(NN respect))
(PP (TO to)
(NP (NNP John)))))))))
(. .)))
amod(sorrow-3, deepest-2)
prep_in(found-16, sorrow-3)
nsubj(found-16, we-5)
appos(we-5, linguists-7)
nn(University-12, Shanghai-9)
nn(University-12, Jiao-10)
nn(University-12, Tong-11)
prep_at(linguists-7, University-12)
appos(University-12, China-14)
complm(express-21, that-17)
nsubj(express-21, we-18)
aux(express-21, can-19)
neg(express-21, not-20)
ccomp(found-16, express-21)
prep_with(express-21, words-23)
poss(gratitude-25, our-24)
dobj(express-21, gratitude-25)
conj_and(gratitude-25, respect-27)
prep_to(gratitude-25, John-29)
laohong
2007-03-17, 01:20 PM
很可惜,树型结构的结果在这里贴出来后就显示的很难看,大家还是下载附件里的文本文件看吧。
when i load the paser,it shows that"Could not load the paser. Out of memory"
why?
the tagger can not be used, either.
I have installed JDk.
I have the same problem with armstrong, why?
laohong
2007-03-18, 12:38 PM
I have the same problem with armstrong, why?
Not sure what exactly caused the problem to you and Armstrong, as the error didn't occur here. Maybe you should tell us exactly what happened with your files.
Have you successful loaded the parser window by double clicking either "lexparser-gui.bat", "stanford-parser-2006-06-11.jar" or "stanford-parser.jar"? If not, you may need to check whether your JDK is correctly installed.
If the parser window can be loaded, you should try to type in ONLY one or two sentences to test. If you want to open an existing text, it must be in utf-8 format. It'd be good to try with a text of only a few sentences first. Then load the parser file (englishFactored.ser.gz for English text; chineseFactored.ser.gz for Chinese text). It may take a while to load it as
"The current version of the parser requires Java 5 (JDK1.5 or above). The parser also requires plenty of memory (a minimum of 100Mb to run as a PCFG parser on sentences up to 40 words in length; typically around 500Mb of memory to be able to parse similarly long typical-of-newswire sentences using the factored model). "
Once the parser file is loaded, click one sentence in the text window, and it will be highlighted in yellow. Then click Parse, you should see the result in the output window in a second.
xujiajin
2007-03-18, 01:21 PM
The same parser loading error on my computer. :(
laohong
2007-03-18, 01:35 PM
The same parser loading error on my computer. :(
Maybe your computer's RAM is not big enough (mine is 1GB). Alternatively, try it from command line with "lexparser.bat input.txt >output.txt".
How much memory do I need to parse very long sentences?
Memory usage by the parser depends on a number of factors:
Memory usage expands roughly with the square of the sentence length. You may wish to set a -maxLength and to skip long sentences.
The factored parser requires several times as much memory as just running the PCFG parser, since it runs 3 parsers.
The command-line version of the parser currently loads the whole of a file into memory before parsing any of it. If your file is extremely large, splitting it into multiple files and parsing them sequentially will reduce memory usage.
A 64-bit application requires more memory than a 32-bit application (Java uses lots of pointers).
A larger grammar or POS tag set requires more memory than a smaller one.
Below are some statistics for 32-bit operation with the supplied englishPCFG and englishFactoredGrammars. We have parsed sentences as long as 234 words, but you need lots of RAM and patience.
Length PCFG Factored
20 50 MB 250 MB
50 125 MB 600 MB
100 350 MB 2100 MB
xujiajin
2007-03-18, 02:03 PM
My memory 1G too. I reinstalled Java JDK 1.5, and failed to load any of the parsers again.
xujiajin
2007-03-18, 02:09 PM
Tried the command mode, error log said: no "server", something call JVM.dll was missing.
laohong
2007-03-18, 04:25 PM
Tried the command mode, error log said: no "server", something call JVM.dll was missing.
I 'm using the latest version of JDK (1.6.0), which can be downloaded from:
https://sdlc1e.sun.com/ECom/EComActionServlet;jsessionid=E40CF9A34D7396FED90FF9BEF430941F
If the above link doesn't work, try to find it at this web site: http://java.sun.com/javase/downloads/index.jsp
click JDK6 to the download page.
The file is about 53.16 MB in size for Windows Offline Installation, Multi-language version.
You may uninstall all previous versions of JDK or JRE first, then install this latest version. After installation completes, copy the folder named "server" from C:\Program Files\Java\jdk1.6.0\jre\bin\ to C:\Program Files\Java\jre1.6.0\bin\.
Then, you need set your Java path by:
Right click My Computer icon, choose Properties, Advanced, Environment Variables, in the System Variables box, find Path, click Edit, and add the following line (all in English) to the end of the text there:
;C:\Program Files\Java\jdk1.6.0\bin;C:\Program Files\Java\jre1.6.0\bin
Finally, reboot your machine to try the parser. The problem of "missing server jvm.dll" should be resolved.
xujiajin
2007-03-18, 04:55 PM
Thanks. Will try. It takes a while for downloading.
xujiajin
2007-03-18, 05:07 PM
It sucks. Registration required for downloading. I gave up.
armstrong
2007-03-18, 05:41 PM
It sucks. Registration required for downloading. I gave up.
I just downloaded JDK 1.6,and it did not to registrate.
the paser can be loaded,but I only input a sentence with only ten chinese characters, it begins to parse, however it shows "error, perhaps the sentence is too long. why?
thanks laonong's clear instruction.
laohong
2007-03-18, 06:31 PM
It sucks. Registration required for downloading. I gave up.
You should try the second link, which requires no registration:
http://java.sun.com/javase/downloads/index.jsp
BTW, the Sun server is very fast in downloading, and it takes you a couple of minutes to get it.
[QUOTE=armstrong;17421]I just downloaded JDK 1.6,and it did not to registrate.
the paser can be loaded,but I only input a sentence with only ten chinese characters, it begins to parse, however it shows "error, perhaps the sentence is too long. why?
Yes,i have the same problem when I try parsing a very short sentence, the paser warns that "perhaps the sentence is too long."
laohong
2007-03-18, 11:42 PM
[QUOTE=armstrong;17421]
Yes,i have the same problem when I try parsing a very short sentence, the paser warns that "perhaps the sentence is too long."
Please make sure:
1. Put the pasrer program package in a folder whose name has no Chinese characters;
2. Start the program by clicking lexparser-gui.bat (it's quite easy to get "out of memory" error if you start it from the two jar files);
3. Load chineseFactored.ser.gz for Chinese text, and you'd better not do anything else while waiting for the parser to load completely, otherwise you may get the "out of memory" error;
3. Under Language Tab, choose "Tokenized Simplified Chinese (utf-8)";
4. Input your Chinese sentence, and leave a space between words, e.g. 赵 先生 是 个 大学 老师 。 他 很 喜欢 写 文章 。
Good luck, guys!
armstrong
2007-03-19, 11:06 AM
Dr.Hong,
I followed your steps and got the result,but it can only process a sentence,how can I process a text and output the result?
thank you.
laohong
2007-03-19, 05:46 PM
how can I process a text and output the result?
thank you.
For English Text:
Under DOS, go to the directory where the parser is located, then type the line below:
lexparser.bat input.txt >output.txt
Then, enter to get your result.
For processing Chinese texts
Firstly, you need segement the input text (search ICTCLAS in this forum if you don't have). That is, convert 今天真热。to 今天 真 热 。
Then save the segmented text in GB format (not UTF-8, which is used for the GUI/windows version).
Next, creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad, and save it with a name of lexparserCh.bat to the same folder where your parser program is:
=============================
@echo off
:: Runs the Chinese PCFG parser on one or more files, printing trees only
:: usage: lexparser fileToparse
java -server -mx800m -cp "stanford-parser.jar;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependenciesCollapsed" chineseFactored.ser.gz %1
=============================
Finally, go to the directory where the parser is located, and type the line below:
lexparserCh.bat inputCh.txt >outputCh.txt
Then, enter to get your result.
armstrong
2007-03-19, 06:31 PM
Ok, thanks, Laohong, I will have a try.
armstrong
2007-03-20, 01:54 AM
thanks,Dr.Hong. the following are a parsed english text and a chinese text.
(ROOT
(S
(NP
(NP (NNS Scores))
(PP (IN of)
(NP (NNS properties))))
(VP (VBP are)
(PP (IN under)
(NP (JJ extreme) (NN fire) (NN threat)))
(SBAR (IN as)
(S
(NP (DT a) (JJ huge) (NN blaze))
(VP (VBZ continues)
(S
(VP (TO to)
(VP (VB advance)
(PP (IN through)
(NP
(NP (NNP Sydney) (POS 's))
(JJ north-western) (NNS suburbs))))))))))
(. .)))
nsubj(are-4, Scores-1)
prep_of(Scores-1, properties-3)
amod(threat-8, extreme-6)
nn(threat-8, fire-7)
prep_under(are-4, threat-8)
mark(continues-13, as-9)
det(blaze-12, a-10)
amod(blaze-12, huge-11)
nsubj(continues-13, blaze-12)
advcl(are-4, continues-13)
aux(advance-15, to-14)
xcomp(continues-13, advance-15)
poss(suburbs-20, Sydney-17)
amod(suburbs-20, north-19)
prep_through(advance-15, suburbs-20)
(ROOT
(S
(NP (NNP Fires))
(VP (VBP have)
(ADVP (RB also))
(VP (VBN shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ major) (NN road)
(CC and)
(NN rail) (NNS links))
(PP (IN between)
(NP (NNP Sydney)
(CC and)
(NNP Gosford))))))
(. .)))
nsubj(shut-4, Fires-1)
aux(shut-4, have-2)
advmod(shut-4, also-3)
prt(shut-4, down-5)
det(road-8, the-6)
amod(road-8, major-7)
dobj(shut-4, road-8)
nn(links-11, rail-10)
conj_and(road-8, links-11)
prep_between(road-8, Sydney-13)
conj_and(Sydney-13, Gosford-15)
(ROOT
(S
(NP
(NP (DT The) (JJ promotional) (NN stop))
(PP (IN in)
(NP (NNP Sydney))))
(VP (VBD was)
(NP (NN everything)
(S
(VP (TO to)
(VP (VB be)
(VP (VBN expected)
(PP (IN for)
(NP
(NP (DT a) (NNP Hollywood) (NN blockbuster))
(: -)
(NP
(NP (NNS phalanxes))
(PP (IN of)
(NP
(NP (NNS photographers))
(, ,)
(NP (DT a) (NN stretch) (NN limo)))))))
(PP
(PP (TO to)
(NP
(NP (DT a) (NN hotel))
(PP (IN across)
(NP (DT the) (NNP Quay)))))
(: -)
(CC but)
(PP (IN with)
(NP (CD one) (NN difference))))))))))
(. .)))
det(stop-3, The-1)
amod(stop-3, promotional-2)
nsubj(everything-7, stop-3)
prep_in(stop-3, Sydney-5)
cop(everything-7, was-6)
aux(expected-10, to-8)
auxpass(expected-10, be-9)
infmod(everything-7, expected-10)
det(blockbuster-14, a-12)
nn(blockbuster-14, Hollywood-13)
prep_for(expected-10, blockbuster-14)
dep(blockbuster-14, phalanxes-16)
prep_of(phalanxes-16, photographers-18)
det(limo-22, a-20)
nn(limo-22, stretch-21)
appos(photographers-18, limo-22)
det(hotel-25, a-24)
prep_to(with-31, hotel-25)
det(Quay-28, the-27)
prep_across(hotel-25, Quay-28)
num(difference-33, one-32)
prep_with(expected-10, difference-33)
(ROOT
(S
(NP
(NP (DT A) (NN line-up))
(PP (IN of)
(NP (NNS masseurs))))
(VP (VBD was)
(VP (VBG waiting)
(S
(VP (TO to)
(VP (VB take)
(NP (DT the) (NNS media))
(PP (IN in)
(NP (NN hand))))))))
(. .)))
det(line-2, A-1)
nsubj(waiting-6, line-2)
prep_of(line-2, masseurs-4)
aux(waiting-6, was-5)
aux(take-8, to-7)
xcomp(waiting-6, take-8)
det(media-10, the-9)
dobj(take-8, media-10)
prep_in(take-8, hand-12)
(ROOT
(S
(NP (NNP Never))
(VP (VBZ has)
(NP
(NP (DT the) (NN term))
(SBAR
(S (`` ``)
(S
(VP (VBG massaging)
(NP (DT the) (NNS media))))
('' '')
(VP (VBD seemed)
(ADJP (RB so) (JJ accurate)))))))
(. .)))
nsubj(has-2, Never-1)
det(term-4, the-3)
dobj(has-2, term-4)
dep(accurate-12, massaging-6)
det(media-8, the-7)
dobj(massaging-6, media-8)
cop(accurate-12, seemed-10)
advmod(accurate-12, so-11)
rcmod(term-4, accurate-12)
(ROOT
(IP
(PP (P 随着)
(NP (NN 住房) (NN 制度) (NN 改革)))
(PU ,)
(NP
(CP
(IP
(VP
(ADVP (AD 越来越))
(VP (VA 多))))
(DEC 的))
(NP (NN 城镇) (NN 居民)))
(VP
(VP (VV 拥有)
(NP
(DNP
(NP (PN 自己))
(DEG 的))
(NP (NN 房屋))))
(PU ,)
(CC 而且)
(VP
(ADVP (AD 大量))
(VP (VV 集中)
(PP (P 在)
(LCP
(NP (NN 住宅) (NN 小区))
(LC 内))))))
(PU 。)))
prep(拥有-11, 随着-1)
nmod(改革-4, 住房-2)
nmod(改革-4, 制度-3)
pobj(随着-1, 改革-4)
advmod(多-7, 越来越-6)
rcmod(居民-10, 多-7)
cpm(多-7, 的-8)
nmod(居民-10, 城镇-9)
nsubj(拥有-11, 居民-10)
assmod(房屋-14, 自己-12)
assm(自己-12, 的-13)
dobj(拥有-11, 房屋-14)
cc(拥有-11, 而且-16)
advmod(集中-18, 大量-17)
ccomp(拥有-11, 集中-18)
prep(集中-18, 在-19)
nmod(小区-21, 住宅-20)
lobj(内-22, 小区-21)
plmod(在-19, 内-22)
(ROOT
(IP
(NP
(DNP
(NP (NN 物) (NN 权) (NN 法))
(DEG 的))
(DP (DT 这)
(QP (CD 一)))
(NP (NN 规定)))
(PU ,)
(VP (VV 回答) (AS 了)
(NP
(NP
(ADJP (JJ 广大))
(NP (NN 群众)))
(DNP
(PP (P 关于)
(IP (PU “)
(VP
(LCP
(IP
(NP (NT 70年))
(NP
(ADJP (JJ 大))
(NP (NN 限)))
(VP (VV 到期)))
(LC 后))
(PU ,)
(NP
(DNP
(NP (PN 我们))
(DEG 的))
(NP (NN 住房)))
(ADVP (AD 怎么))
(VP (VV 办)))
(PU ”)))
(DEG 的))
(NP (NN 疑问))))
(PU 。)))
nmod(法-3, 物-1)
nmod(法-3, 权-2)
assmod(规定-7, 法-3)
assm(法-3, 的-4)
det(规定-7, 这-5)
det(这-5, 一-6)
nsubj(回答-9, 规定-7)
asp(回答-9, 了-10)
amod(群众-12, 广大-11)
nmod(疑问-28, 群众-12)
assmod(疑问-28, 关于-13)
tcomp(到期-18, 70年-15)
amod(限-17, 大-16)
nsubj(到期-18, 限-17)
tclaus(后-19, 到期-18)
lccomp(办-25, 后-19)
assmod(住房-23, 我们-21)
assm(我们-21, 的-22)
dobj(办-25, 住房-23)
advmod(办-25, 怎么-24)
clmpd(关于-13, 办-25)
assm(关于-13, 的-27)
dobj(回答-9, 疑问-28)
I tried to tag an english text with standford tagger,but it doesn't work,why?
laohong
2007-03-20, 09:51 AM
thanks,Dr.Hong. the following are a parsed english text and a chinese text......
Glad that you've managed to get it.
laohong
2007-03-20, 09:52 AM
I tried to tag an english text with standford tagger,but it doesn't work,why?
Please read the messages in this thread, and follow the instruction if possible.
armstrong
2007-03-20, 11:46 AM
But this thread is mainly about Standford parser,not about the tagger.
laohong
2007-03-20, 09:28 PM
But this thread is mainly about Standford parser,not about the tagger.
It's quite similar actually. Anyway, firstly creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad:
=============================
@echo off
:: To tag a file using the pre-trained bidirectional model
:: usage: postagger.bat inputfile
java -mx300m -classpath postagger-2006-05-21.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model wsj3t0-18-bidirectional/train-wsj-0-18 -file input.txt >output.txt
=============================
Next, save it as a plain text file with the name of postagger.bat to the same folder where your Standford POS Tagger program is;
Then, save an English text with the name as input.txt to the same folder where the Tagger and postagger.bat are;
Finally, go to the folder where the Tagger, the postagger.bat and the input.txt are located, and double click the postagger.bat file to get your result file output.txt.
To tag another file, simply rename output.txt, and change the content of the input.txt file.
Good luck!
armstrong
2007-03-20, 09:45 PM
thanks again,Dr.hong.
the following is a tagged text.
Zidane/NNP remains/VBZ best-loved/JJ French/JJ despite/IN head-butt/NN
Zinedine/NNP Zidane/NNP remains/VBZ France/NNP 's/POS best-loved/JJ personality/NN despite/IN his/PRP$ head-butt/NN against/IN Italy/NNP 's/POS Marco/NNP Materazzi/NNP in/IN the/DT 2006/CD soccer/NN World/NNP Cup/NNP final/JJ ,/, a/DT survey/NN showed/VBD on/IN Saturday/NNP ./.
Zidane/NNP came/VBD first/RB in/IN a/DT ranking/NN of/IN France/NNP 's/POS Top/NNP 50/CD personalities/NNS ,/, beating/VBG ex-tennis/NN champion/NN Yannick/NNP Noah/NNP who/WP came/VBD in/IN the/DT second/JJ place/NN ,/, and/CC leaving/VBG singers/NNS Charles/NNP Aznavour/NNP and/CC Johnny/NNP Hallyday/NNP ,/, as/IN well/RB as/IN actor/NN Gerard/NNP Depardieu/NNP behind/IN ./.
The/DT footballer/NN had/VBD already/RB come/VBN first/RB in/IN a/DT comparable/JJ survey/NN by/IN pollster/NN Ifop/NNP six/CD months/NNS ago/RB ./.
Socialist/JJ presidential/JJ candidate/NN Segolene/NNP Royal/NNP came/VBD in/IN 23rd/CD place/NN ,/, up/RB from/IN 49th/CD in/IN last/JJ July/NNP ,/, and/CC her/PRP$ likely/JJ conservative/NN challenger/NN for/IN next/JJ year/NN 's/POS election/NN ,/, Nicolas/NNP Sarkozy/NNP ,/, came/VBD 42nd/NNP in/IN the/DT Ifop/NNP poll/NN ./.
Sarkozy/NNP had/VBD not/RB appeared/VBN on/IN the/DT list/NN in/IN July/NNP ./.
Both/DT Royal/NNP and/CC Sarkozy/NNP were/VBD overtaken/VBN by/IN television/NN star/NN Nicolas/NNP Hulot/NNP ,/, who/WP has/VBZ threatened/VBN to/TO run/VB for/IN president/NN unless/IN mainstream/NN politicians/NNS do/VBP more/JJR for/IN the/DT environment/NN ./. He/PRP came/VBD third/JJ in/IN the/DT poll/NN of/IN 1,064/CD people/NNS ./.
Zidane/NNP was/VBD shown/VBN a/DT red/JJ card/NN 10/CD minutes/NNS before/IN the/DT end/NN of/IN extra/JJ time/NN in/IN the/DT July/NNP 9/CD final/JJ after/IN head/NN butting/VBG Materazzi/NNP in/IN the/DT chest/NN ./. That/DT ensured/VBD he/PRP missed/VBD the/DT penalty/NN shootout/NN that/WDT decided/VBD the/DT match/NN in/IN Italy/NNP 's/POS favor/NN ./.
The/DT incident/NN inspired/VBD the/DT summer/NN hit/VBD ``/`` Coup/NNP de/IN Boule/NNP ''/'' -LRB-/-LRB- Head/NNP Butt/NNP -RRB-/-RRB- in/IN France/NNP ,/, and/CC an/DT Italian/JJ designer/NN has/VBZ come/VBN up/RP with/IN a/DT line/NN of/IN sweatshirts/NNS with/IN two/CD stick/NN figures/NNS depicting/VBG the/DT incident/NN printed/VBN on/IN the/DT back/NN ./.
armstrong
2007-03-20, 09:47 PM
By the way,how many tagsets are there in Standford Posttagger?
armstrong
2007-03-20, 10:15 PM
It seems that both Gototagger and Standford share the same tagsets, Brill tagsets.
the following is the same text tagged with Gototagger.
Zidane/NNP remains/VBZ best-loved/JJ French/JJ despite/IN head-butt/JJ
Zinedine/NNP Zidane/NNP remains/VBZ France's/NNP best-loved/JJ personality/NN despite/IN his/PRP$ head-butt/JJ against/IN Italy's/NNP Marco/NNP Materazzi/NNP in/IN the/DT 2006/CD soccer/NN World/NNP Cup/NNP final,/VBG a/DT survey/NN showed/VBD on/IN Saturday./NNP
Zidane/NNP came/VBD first/JJ in/IN a/DT ranking/NN of/IN France's/NNP Top/JJ 50/CD personalities,/NN beating/VBG ex-tennis/NN champion/NN Yannick/NNP Noah/NNP who/WP came/VBD in/IN the/DT second/JJ place,/NN and/CC leaving/VBG singers/NNS Charles/NNP Aznavour/NNP and/CC Johnny/NNP Hallyday,/NNP as/NNP well/RB as/IN actor/NN Gerard/NNP Depardieu/NNP behind./JJ /NN
The/DT footballer/NN had/VBD already/RB come/VB first/JJ in/IN a/DT comparable/JJ survey/NN by/IN pollster/NN Ifop/NNP six/CD months/NNS ago./RB /VBP
Socialist/NNP presidential/JJ candidate/NN Segolene/NNP Royal/NNP came/VBD in/IN 23rd/CD place,/NN up/IN from/IN 49th/JJ in/IN last/JJ July,/NNP and/CC her/PRP$ likely/JJ conservative/JJ challenger/NN for/IN next/JJ year's/NNS election,/VBP Nicolas/NNP Sarkozy,/NNP came/VBD 42nd/JJ in/IN the/DT Ifop/NNP poll./CD /NN
Sarkozy/NNP had/VBD not/RB appeared/VBN on/IN the/DT list/NN in/IN July./NNP /NN
Both/DT Royal/NNP and/CC Sarkozy/NNP were/VBD overtaken/VBN by/IN television/NN star/NN Nicolas/NNP Hulot,/NNP who/WP has/VBZ threatened/VBN to/TO run/VB for/IN president/NN unless/IN mainstream/NN politicians/NNS do/VBP more/JJR for/IN the/DT environment./JJ He/PRP came/VBD third/JJ in/IN the/DT poll/NN of/IN 1,064/CD people./CD /NN
Zidane/NNP was/VBD shown/VBN a/DT red/JJ card/NN 10/CD minutes/NNS before/IN the/DT end/NN of/IN extra/JJ time/NN in/IN the/DT July/NNP 9/CD final/JJ after/IN head/NN butting/VBG Materazzi/NNP in/IN the/DT chest./VBN That/DT ensured/VBD he/PRP missed/VBD the/DT penalty/NN shootout/NN that/IN decided/VBN the/DT match/NN in/IN Italy's/NNP favour./CD /NN
The/DT incident/NN inspired/VBD the/DT summer/NN hit/VBD "Coup/NN de/FW Boule"/NNP (Head/NNP Butt)/NNP in/IN France,/NNP and/CC an/DT Italian/JJ designer/NN has/VBZ come/VBN up/IN with/IN a/DT line/NN of/IN sweatshirts/NNS with/IN two/CD stick/NN figures/NNS depicting/VBG the/DT incident/NN printed/VBN on/IN the/DT back./CD
laohong
2007-03-20, 10:21 PM
It is using Penn TreeBank tagset. Here is the list tags used:
http://ilk.uvt.nl/~zavrel/tagset.txt
armstrong
2007-03-20, 10:28 PM
Oh,I see, thanks laohong, you are always helpful and sincere. I learnt a lot from you.
how to use Standford Classifier? I tried, but the two executable Jar files can not run,why?
armstrong
2007-03-21, 11:17 PM
how to use Standford Classifier? I tried, but the two executable Jar files can not run,why?
Yes, i meet the same problem.
laohong
2007-03-22, 11:06 AM
You guys are asking a multi-milllion dollars question. Why do you need a classifier like that?
haha, laohong is asking as well!
sunlight
2007-03-29, 03:26 PM
It's quite similar actually. Anyway, firstly creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad:
=============================
@echo off
:: To tag a file using the pre-trained bidirectional model
:: usage: postagger.bat inputfile
java -mx300m -classpath postagger-2006-05-21.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model wsj3t0-18-bidirectional/train-wsj-0-18 -file input.txt >output.txt
=============================
Next, save it as a plain text file with the name of postagger.bat to the same folder where your Standford POS Tagger program is;
Then, save an English text with the name as input.txt to the same folder where the Tagger and postagger.bat are;
Finally, go to the folder where the Tagger, the postagger.bat and the input.txt are located, and double click the postagger.bat file to get your result file output.txt.
To tag another file, simply rename output.txt, and change the content of the input.txt file.
Good luck!
Hi Laohong, I followed your instructions and the english text was successfully tagged. do you know how to tag a chinese file? I tried to search information on the website of Stanford NLP, but I could't find the relevant information... It will be very appreciated if you can help me to sort it out.
laohong
2007-03-29, 10:35 PM
The PosTagger was trained for English texts, though it's said you can train it to tag Chinese texts. However, it may be difficult for many of us to do so. It'd be good to use ICTCLAS_Win.exe to tag your Chinese tests. You can download it under "NLP Tools" in my online storage at:
http://corpuslaohong.ys168.com/
Password: corpus4u
Leave a message there after you got it.
If you do want to tag Chinese texts with Standford tools, the Standford Parser can also produce the POS information for Chinese texts. Read my instruction on how to parse a Chinese text with Standford Parser in earlier posts.
rusia
2007-04-09, 03:33 AM
我这里汉语parser起来总是乱码,不知道怎么回事啊....
laohong
2007-04-09, 09:48 AM
最好从头阅读一下这个帖子,重点看19和21楼。
see_how_much_I_love_you
2007-04-18, 02:01 PM
谢谢!
请问:用何软件检索用Stanford Parser标注的语料?
laohong
2007-04-18, 02:28 PM
谢谢! 请问:用何软件检索用Stanford Parser标注的语料?
Tregex and Tsurgeon
A Java implementation of a Tgrep2-style utility for matching patterns in trees, and a tree-transformation utility built on top of this matching language.
http://nlp.stanford.edu/software/index.shtml
see_how_much_I_love_you
2007-04-18, 04:25 PM
为什么总打不开呢?提示:
Failed to load Main-class manifest from C:\tregex-2005-11-23\tregex.jar
我明明就是在这个文件夹下点击的tregex.jar.
laohong
2007-04-18, 05:46 PM
为什么总打不开呢?提示:
Failed to load Main-class manifest from C:\tregex-2005-11-23\tregex.jar
我明明就是在这个文件夹下点击的tregex.jar.
I haven't tried it yet, and you may want to contact the developer directly to request the instruction.
see_how_much_I_love_you
2007-04-20, 08:44 PM
我问了,等了两天,一有空就打开邮箱看看,没有回信.是不是问题太菜了,人家不屑于回答?郁闷.
teneyuan
2008-04-17, 04:34 PM
请问啊,为什么我的只能看到树形的分析,看不到typed dependencies的列表啊?
teneyuan
2008-04-17, 04:38 PM
prep(拥有-11, 随着-1)
nmod(改革-4, 住房-2)
nmod(改革-4, 制度-3)
pobj(随着-1, 改革-4)
advmod(多-7, 越来越-6)
rcmod(居民-10, 多-7)
cpm(多-7, 的-8)
nmod(居民-10, 城镇-9)
nsubj(拥有-11, 居民-10)
assmod(房屋-14, 自己-12)
assm(自己-12, 的-13)
dobj(拥有-11, 房屋-14)
cc(拥有-11, 而且-16)
advmod(集中-18, 大量-17)
ccomp(拥有-11, 集中-18)
prep(集中-18, 在-19)
nmod(小区-21, 住宅-20)
lobj(内-22, 小区-21)
plmod(在-19, 内-22)
这些是怎么出来的?我下载的parser,并不能解析出这些啊?
xujiajin
2008-04-17, 08:49 PM
应该要java,然后需要在命令行模式下操作。command mode,就是在dos命令行下。
teneyuan
2008-04-17, 09:56 PM
谢谢,各位热心的人,我已经解决一部分问题了
能解析的都解析出来了,可我还有一个问题啊,laohong 有讲过将英文分词的合并到系统里去,可
怎么把中文分词的也合并到里面呢,如果能把中科院的那个词性标注、分词系统也合并到里面去,或
者不应该说是合并,而是一起运行就好了
jacky0152
2008-04-21, 01:04 AM
为什么我的parser 这个button 是灰色的不能点啊!!!!!!!!!!!!!!!???
解决!
jacky0152
2008-04-21, 01:15 AM
englishPCFG.ser和那个englishfactored.ser有什么区别!??第一个有什么用?请回答下啊
jacky0152
2008-04-21, 02:05 AM
parser问题解决了。就是那个图没有办法输出,txt格式。。。呵呵。没法比。
再有一个问题是这个tagger没有bat文件啊。我执行那个啊??
jacky0152
2008-04-21, 10:36 AM
laohong如果我一次做多个文件的处理,bat文件如何编辑,你给出一个文件的输出,但如果要多个文件那?
laohong
2008-04-21, 04:57 PM
laohong如果我一次做多个文件的处理,bat文件如何编辑,你给出一个文件的输出,但如果要多个文件那?
请阅读19楼到28楼的内容。
jacky0152
2008-04-26, 03:37 PM
这个软件处理的文本最好是什么形式的。比如:
It was among these that Hinkle identified a photograph of Barco! For it seems that Barco, fancying himself a ladies' man (and why not after seven marriages?), had listed himself for Mormon Beard roles
还是:
It was among these that Hinkle identified a photograph of Barco!
For it seems that Barco, fancying himself a ladies' man (and why not after seven
就是每句是连贯的就可以,还是必须每个句子换一行~!
laohong
2008-04-27, 12:37 PM
You should read the manual by yourself.
Maggie03003
2008-08-18, 10:29 PM
For English Text:
Under DOS, go to the directory where the parser is located, then type the line below:
lexparser.bat input.txt >output.txt
Then, enter to get your result.
For processing Chinese texts
Firstly, you need segement the input text (search ICTCLAS in this forum if you don't have). That is, convert 今天真热。to 今天 真 热 。
Then save the segmented text in GB format (not UTF-8, which is used for the GUI/windows version).
Next, creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad, and save it with a name of lexparserCh.bat to the same folder where your parser program is:
=============================
@echo off
:: Runs the Chinese PCFG parser on one or more files, printing trees only
:: usage: lexparser fileToparse
java -server -mx800m -cp "stanford-parser.jar;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependenciesCollapsed" chineseFactored.ser.gz %1
=============================
Finally, go to the directory where the parser is located, and type the line below:
lexparserCh.bat inputCh.txt >outputCh.txt
Then, enter to get your result.
请问,对英文文本,这样的批处理文件要怎么样改?我的理解是这样output文件里就是Parsed的text文件,而不是树形结构了,对吗?因为最终想保存Parsed的文件。
laohong
2008-08-18, 11:56 PM
请问,对英文文本,这样的批处理文件要怎么样改?我的理解是这样output文件里就是Parsed的text文件,而不是树形结构了,对吗?因为最终想保存Parsed的文件。
阅读 + 试验
huangjianking
2008-09-09, 02:40 AM
load parser时,选择Englishfactored.ser.gz或者Chinesefactored.ser.gz,那才是paser。我试过了。没问题。不过,怎样将分析结果输入到文本文件里面?
laohong
2008-09-09, 10:04 AM
load parser时,选择Englishfactored.ser.gz或者Chinesefactored.ser.gz,那才是paser。我试过了。没问题。不过,怎样将分析结果输入到文本文件里面?
请阅读前面的帖子。
huangjianking
2008-09-18, 03:52 PM
请阅读前面的帖子。
我试过了,生成的output.txt中是一片空白。我用文本工具EditPlus打开lexparser.bat,在最后的一行后面粘贴上前面你给出的命令行。这样不对吗?
Maggie03003
2008-10-07, 08:28 PM
请问,如果我要将parser生成的各个括号和附码都隐藏起来,用什么软件可以做到?
lili58
2008-10-08, 01:57 PM
Then save the segmented text in GB format (not UTF-8, which is used for the GUI/windows version).
这个gb 模式我在txt另存为的下拉菜单中没看到啊。。。。。
vBulletin® v3.7.4,版权所有 ©2000-2009,Jelsoft Enterprises Ltd.