转贴:POS Tagging Software - a summary

xusun575

高级会员
Looking for Automatic POS Tagging Software - a summary of responses
"Lam Yuen Wing, Peter" <ywlam_AT_kcrc.com>
Sat Feb 18 10:28:00 2006

Previous message: Subject: [Corpora-List] Spanish Parser
Next message: Subject: [Corpora-List] Annonce de colloque : "corpus et dictionnaires de langue s de spécialité"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

--------------------------------------------------------------------------------
Dear all,

About six weeks ago, I asked for pointers on user-friendly POS taggers
that run under Windows and are able to tag and subcategorise words, e.g.
to tag adjectives and subcategorise them into predicates, attributes,
superlatives, participles, etc. I am grateful to the following members,
who have spent time writing me valuable advice. The following is a
summary of their responses:

Ted Pedersen tpederse_AT_d.umn.edu
Ted suggested trying GATE http://gate.ac.uk/, which includes a POS
tagger, and "is fairly easy to install and use (it is
Java based and runs on Windows, Linux, etc...)".

Alex Fang acfang_AT_cityu.edu.hk
Alex recommended AUTASYS, which runs under Windows. For more
information, please visit
http://www.phon.ucl.ac.uk/home/alex/project/tagging/tagging.htm.
AUTASYS provides subcategorisations and gives a selection of two tag
sets: ICE and LOB. In addition, it has a lemmatisation module. It is
available for academic purposes only, 500 pound sterling one-off payment
for a single-user licence or 1,000 pounds for a site licence of one
year. AUTASYS tags 1.8 million words per minute, with estimated accuracy
of 95%. Output results can be in horizontal (passage style) or vertical
format.

Neil Millar kansaineil_AT_hotmail.com <mailto:kansaineil_AT_hotmail.com>
Neil suggested giving a try of Brill's Tagger for free at
http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z. The tagger runs on Windows
and is "easy to use".

Eric Atwell eric_AT_comp.leeds.ac.uk <mailto:eric_AT_comp.leeds.ac.uk>
Eric said the CLAWS system can be used via WWW by accessing the UCREL
website <http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/>
http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/, which means
it does not necessarily run on UNIX.
There is a free trial service offering access to the latest version of
the tagger, CLAWS4:
http://www.comp.lancs.ac.uk/computing/research/ucrel/claws/trial.html

Paul Rayson rayson_AT_exchange.lancs.ac.uk
<mailto:rayson_AT_exchange.lancs.ac.uk>
Paul advised there are beta versions of CLAWS for Windows, Linux and
shortly for MacOSX. Trials could be available on request.

Oliver Mason o.mason_AT_bham.ac.uk <mailto:eek:.mason_AT_bham.ac.uk>
Oliver suggested a try of Qtag
(http://www.english.bham.ac.uk/staff/omason/software/qtag.html), which
is written in Java and thus runs on Windows.

SVMTool team jgimenez_AT_lsi.upc.edu <mailto:jgimenez_AT_lsi.upc.edu>
SVMTool team said that in the TALP Research Center (Barcelona) they have
developed a geberak sequential tagger, and applied it to the problem of
PoS tagging. It may be freely downloaded at:
http://www.lsi.upc.edu/~nlp/SVMTool/.

Models for English, Spanish and Catalan are available. And, given
annotated data, it may be trained for any language, any sequential
tagging problem (PoS tagging, NERC, chunking, etc). The C++ version
exhibits a tagging speed of 10,000 words per second.

Atanas Chanev artanisz_AT_mail.bg <mailto:artanisz_AT_mail.bg>
Atanas suggested trying the T'n'T tagger (by Thorsten Brants), which is
freely available through registration with
http://www.coli.uni-saarland.de/~thorsten/tnt/
<http://www.coli.uni-saarland.de/~thorsten/tnt/> . Atanas said: "There
is a version for Windows and it has the most user friendly interface
among the taggers I have used. It is one of the currently most accurate
taggers".

A package of taggers working under Linux can be found on:
http://acopost.sourceforge.net/ (follow the sourceforge link). Most of
the Linux applications should work under cygwin emulator of Linux for
Windows, which is downloadable from internet .

Another tagger is the SVMtool (Jesús Giménez and Lluís
Màrquez). Its accuracy is similar to the
accuracy of T'n'T for small amounts of training data. There are c++ and
Perl versions and Perl can be downloaded for free from
www.activestate.com.

Svetlana Sheremetyeva linklana_AT_yahoo.com
Svetlana has her FLAT (Flexible Language Acquisition Tool), which is
"extremely user friendly and can be tuned to any features". Description
of it can be found at http://lanaconsult.com.

Gerard Peregrin GerardPer_AT_aol.com <mailto:GerardPer_AT_aol.com>
Gerald recommended to try the software at
http://www-nlp.stanford.edu/software/lex-parser.shtml
<http://www-nlp.stanford.edu/software/lex-parser.shtml> , which is
written in Java.

Vlad Gojol gojol_AT_rnc.ro <mailto:gojol_AT_rnc.ro>
Vlad suggested GojolParser, which is "a deep structure morpho-syntactic
analyzer".

Best
Peter Lam
PhD Student
The Hong Kong Polytechnic University
 
Back
顶部