[ZT and Download] English lemme list

本文由 xiaoz2007-02-24 发表於 "母语语料库" 讨论区

  1. xiaoz

    xiaoz 永远的超级管理员 Staff Member

    I have put a zip file on my website (http://mcs.open.ac.uk/dh5368/) it
    contains a list of inflection-lemma mappings, lemma-inflection mappings
    and a file called singles.txt which contains forms in the lexicon that
    could not be reduced.
    The data was extracted from the CUVPlus lexicon by running a lemmatising
    algorithm to reduce every entry in the lexicon and checking the
    resulting proposed lemmas against the lexicon.
    The file lemmas.txt contains inflection-lemma mappings that were
    corroborated by the lexicon and inflect.txt contains the inverse
    mappings. These files include words that are already in base form.
    The singles.txt file contains word forms that judging by the tag should
    be reducible but for which no proposed lemma could be found in the
    lexicon. Most are adverbs that have no adjective base form, many are
    non-count plural forms. There are also some (BNC) tagging errors,
    misspellings and rare word forms. I have included the BNC frequency for
    each entry from the lexicon as most of the noise is of low frequency.
    Please note that this means that words not covered by the CUVPlus
    lexicon do not appear in the mappings.
    All the entries in the files are tagged using the C7 tagset.
    The data is work in progress, but it is pretty clean I believe.
    If you decide to use the mapping tables please cite my PhD thesis - it
    is at Birkbeck College, University of London and due for submission
    later this year.

    Thank you,

    Download link: http://mcs.open.ac.uk/dh5368/lemmas.zip
  2. hyy


  3. hyy


    :) Hi, could any has got the lemmas file posted here and share with me? It seems that the link does not work now. Thanks for Dr. Xiao and Dave for providing the post. My email is flyingbird07@yeah.net

    Have a lovely weekend

  4. hyy


    Sorry, could anyone kindly share the downloaded the list with me? Thanks