BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

本文由 xujiajin2011-02-11 发表於 "语料库标注" 讨论区

  1. xujiajin

    xujiajin 管理员 Staff Member

    This GUI (Graphical User Interface) version of the Stanford POS Tagger was developed by Mr. Yunlong Jia, and designed by Dr. Jiajin Xu and Mr. Yunlong Jia.

    This tagger automatically assigns part of speech information to each word in the loaded English plain text(s). The result of tagging will be saved after execution by default in the same directory of the source text(s) with the extension *.tag. To tag the texts, you need to choose one of the pre-loaded tagging models after you import your raw texts.

    The tagger uses Penn Treebank tag set, which is described in Treebank POS tagset.pdf in the program folder.

    Please note that the tool requires Java 1.5+ to be installed before you can analyze any texts.

    More information about the Stanford POS tagger is available at http://nlp.stanford.edu/software/tagger.shtml.

    Please cite the program as:

    Xu, Jiajin & Yunlong Jia. (2011). BFSU Stanford POS Tagger: A Graphical Interface Windows Version. Beijing: National Research Center for Foreign Language Education, Beijing Foreign Studies University.

    DOWNLOAD
    http://www.fleric.org.cn/pub/soft/BFSU_Stanford_POS_Tagger1.1.2.rar

    Also downloadable at http://ishare.iask.sina.com.cn/f/13470938.html


    春节期间,WilliamJia不辞辛苦,编写了一个词性赋码工具,第一时间与诸位分享。
    该工具原为斯坦福大学NLP中心开发,但用户界面很不友好,因此特开发了Windows界面。运行前,请安装Java运行环境。
    http://www.java.com/zh_CN/download/

    http://www.xdowns.com/soft/6/56/2007/Soft_37451.html
     
  2. volfer

    volfer Moderator

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    谢谢许老师第一时间的分享!

    二位老师辛苦了,为论坛上传了这么多好东西,下载收藏了。
     
  3. 回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    不知道准确率怎么样
    好像corpora 识别不出来,只能标示NN 而不是NNS
     
  4. williamJia

    williamJia 开放语料库项目

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    http://www.comp.leeds.ac.uk/amalgam/tagsets/upenn.html
     
  5. xujiajin

    xujiajin 管理员 Staff Member

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    我查了很多文献,没有找到准确率的数据报告。从实际标注看,很少有错。

    另外,斯坦福的这个NLP中心是顶尖级的研究机构,产品比较值得信赖。准确率应与TreeTagger相当,而且两者所用码集相同。
     
  6. 回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    两位博士为这个论坛奉献了很多好东东,感谢二位!
     
  7. 回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    感谢分享!在新的一年,祝许老师事业春风、云龙学业“兔”飞猛进!
     
  8. 回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    许老师,您好!我下了您的词性标注工具,自己建了两个微型语料库想对它们进行标注,不过好像要tagging models,想请教一下在哪里可以找到,找了半天没找到。。。
    因为目前在准备写这方面的论文,之前没接触过语料库。。。 谢谢哦!:)
     
  9. xujiajin

    xujiajin 管理员 Staff Member

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    model在压缩包里有。

    你先要加载文本【Choose Texts】,然后,【Select Tagger】,选择两个models任意一个都可以,我一般选bidirectional-distsim-wsj-0-18
     
  10. williamJia

    williamJia 开放语料库项目

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    http://nlp.stanford.edu/software/pos-tagger-faq.shtml

    In applications, we nearly always use the left3words-wsj-0-18.tagger model, and we suggest you do too. It's nearly as accurate (96.97% accuracy vs. 97.32% on the standard WSJ22-24 test set) and is an order of magnitude faster.
     
  11. xujiajin

    xujiajin 管理员 Staff Member

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    谢谢WilliamJia的补充,这下清楚了,也放心了,准确率可以达到97.32%。而且建议大家用left3words-wsj-0-18.tagger model。实际上,两个model的准确率不相上下。
     
  12. xujiajin

    xujiajin 管理员 Staff Member

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    新版说明:


    BFSU Stanford POS Tagger: A Graphical Interface Windows Version

    About

    This GUI (Graphical User Interface) version of the Stanford POS Tagger was developed by Mr. Yunlong Jia, and designed by Dr. Jiajin Xu and Mr. Yunlong Jia.

    This tagger automatically assigns part of speech information to each word in the loaded English text(s), and generates output texts in either of the three formats, i.e. word_POS, word/POS, and XML. The result of tagging will be saved in the same directory of the source text(s).

    To tag the texts, you need to choose one of the pre-loaded tagging models after you import your raw texts. The left3words-wsj-0-18.tagger is recommended, which achieves an accuracy of 97.32% and faster. The other model, the bidirectional one, has an accuracy of 96.97% and slower (see also: Is your tagger slow? at http://nlp.stanford.edu/software/pos-tagger-faq.shtml).

    The tagger uses Penn Treebank tag set, which is described in Treebank POS tagset.pdf in the program folder.

    Please note that the tool requires Java 1.5+ to be installed before you can process any texts.

    More information about the Stanford POS tagger is available at http://nlp.stanford.edu/software/tagger.shtml.

    Please cite the program as:

    Xu, Jiajin & Yunlong Jia. (2011). BFSU Stanford POS Tagger: A Graphical Interface Windows Version. Beijing: National Research Center for Foreign Language Education, Beijing Foreign Studies University.

    BFSU Stanford POS Tagger: A Graphical Interface Windows Version is freeware. The software comes on an “as is” basis, and the authors will accept no liability for any damage that results from using the software.
     
  13. xujiajin

    xujiajin 管理员 Staff Member

    回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    我用samples里的两个文本测试了一下,用left3words-wsj-0-18.tagger那个model比bidirectional-distsim-wsj-0-18.tagger的那个model,速度快2倍(快5秒)。
     
  14. 回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

    谢谢许老师哦,马上试试 。。。嘿嘿。。
     
  15. xujiajin

    xujiajin 管理员 Staff Member

  16. 回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

    刚学用语料库,比较搞不清楚状况……

    请问老师,打开界面后右边的settings那些怎么都不能点击?另外choose text的时候为何不能选择txt的文件?

    谢谢老师!
     
  17. 回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

    另外老师,我发现models文件夹里的文件都是.tagger或者.prop格式的,请问这是一种怎样的格式呢?该用什么打开?又该如何生成这样格式的文件呢?
     
  18. 回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

    那是训练后的模型,你不能改动。
    你可以根据自己的语料,训练自己的模型。
     
  19. 回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

    我现在知道怎么打开txt文档了,好像也tag成了,生成了.tag文件,该怎么打开呢?
     
  20. 回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

    噢!可以设置成用记事本打开,谢谢啦!有问题我再过来问!