Lemma, Wordlist, Concordance

ghoti

初级会员
4 questions about Lemma, Wordlist, Concordance, thanks in advance:

1. The e_lemma.txt(Ver.1) complied by Prof. Someya on September 1, 1998 is very popular in the cirle of corpus research. But if a file contains the *ed (e.g altered), *ing(e.g altering), *s(e.g alters) forms of a word,but not the root(e.g alter), then I could not obtain the lemmatized root form of alter. How to solve the problem except by manually deleting those inflectioins? cf.
http://www.corpus4u.org/archive/index.php/t-406.html

2. I read
http://www.corpus4u.org/showthread.php?t=145
English lemma for use with WordSmith - Corpus4u 论坛

and downloaded
http://www.lexically.net/downloads/v.../BNC_World.zip

After I decompress it, I found that it is a *.lst file which must be the wordlist of BNC, rather than an e_lemma.txt, then how to use this BNC_World.lst? How to upload BNC_World.lst onto Wordsmith 3?

3. I wish to obtain the defining vocabulary of some dictionaries, such as Cambridge Advanced Learner's Dictionary, but I only found *.lst file which cannot be viewed in a txt file. It is said that *.lst can be viewed in access, which I did not install and try. A more greedy and ambitious question is how to extract all the English headwords from the electronic version of a dictionary which is usually stored in a *.lst file? For academic purpose, I am very interested in Collins COBUILD English Dictionary Frequently Wordlist and I wish to obtain all the diamonded words preferrably in a txt format. But the current COBUILD wordlist available on the Internet stops at ◆◆◆◇◇, without ◆◆◇◇◇ and ◆◇◇◇◇ words? I guess those words marked by ◆◆◆◆◆, ◆◆◆◆◇, ◆◆◆◇◇ are not input manually one by one, although some words are absent, such as best,follow, high, departure. cf.
http://bbs.adse.cn/read.php?tid=173
http://bbs.adse.cn/read.php?tid=13116

4. I constructed a small corpus, is there any free concordance software with which I can provide on-line concordance service, that is, to provide a retrieve window on the Internet without supplying my original corpus.
 
Back
顶部