如何把Gotagger标注格式转换为claws格式?Perl代码

刘语料

封禁用户
one more question:
If I use WST 3.0 to search the pattern(nouns modified by adjectives) in above text, how to set the algorithm?
thank you.
 

xiaoz

永远的超级管理员
Staff member
<w POS="JJ">*</w> <w POS="NN">*</w>

returns all instances of JJ+NN
 

刘语料

封禁用户
Dr.Xiao, perhaps it's a mostake, for I search "<w POS="JJ">*", there are only twenty hits, but with "<w POS="JJ">*</w> <w POS="NN">*</w>", there are up to 735 hits.
some of the hits like the following. what's wrong with it?
WordSmith Tools -- 2006-5-19 20:09:35

1 <w POS="IN">of</w> <w POS="AT">the</w> <w POS="NN">elect
2 <w POS=",">,</w> <w POS="AT">the</w> <w POS="NN">jury
3 <w POS="CS">that</w> <w POS="AT">the</w> <w POS="NN">City
4 <w POS="WDT">which</w> <w POS="BEDZ">was</w> <w POS="VBN">
5 <w POS="NP">Allen</w> <w POS="NP">Jr.</w> <w POS=".">.</w>
6 <w POS="NN">jury</w> <w POS="HVD">had</w> <w POS="BEN">
7 <w POS=",">,</w> <w POS="CC">but</w> <w POS="PPS">it<
8 <w POS="NN">jury</w> <w POS="VBD">said</w> <w POS=",">,
9 <w POS="NNS">jurors</w> <w POS="VBD">said</w> <w POS=".">
10 <w POS="NN">jury</w> <w POS="VBD">recommended</w> <w POS="
11 <w POS="AT">the</w> <w POS="NN">praise</w> <w POS="CC">
12 <w POS="CC">and</w> <w POS="NNS">administrators</w> <w
13 <w POS="PPO">them</w> <w POS="AT">the</w> <w POS="NP">
14 <w POS="NN">jury</w> <w POS="VBD">said</w> <w POS="PPS">
15 <w POS="NN">jury</w> <w POS="VBD">said</w> <w POS=",">,
16 <w POS="AT">the</w> <w POS="JJ">widespread</w> <w POS="NN">
17 <w POS="AT">the</w> <w POS="NN">State</w> <w POS="NN">Wel
18 <w POS="HVD">had</w> <w POS="JJ">over-all</w> <w POS="NN">
19 <w POS="NN">jury</w> <w POS="VBD">said</w> <w POS=",">,</w
20 <w POS="NP">Atlanta</w> <w POS="NN">Bar</w> <w POS="NN">
 

armstrong

高级会员
“*”在正则表达式中具有贪婪性,往往找出比所想要的多得多的东西。建议使用Barlow教授的 Collocate软件,其中只要在POS tag 中设置好,就可以找到你正好所想的东西。
 

xiaoz

永远的超级管理员
Staff member
If you use Wordsmith, it's advisable for use the word_tag style instead of XML. WST does not handle markup language very well.
 
顶部