PDA

查看完整版本 : Guardian wordlist for WordSmith 3


xiaoz
2005-06-23, 03:28 AM
Guardian newspaper corpus (over 90 million words) wordlist for use with WordSmith version 3

http://www.ling.lancs.ac.uk/corplang/zipfiles/guardian.zip

xusun575
2005-07-25, 12:42 AM
Thanks a lot for your efforts and i still think this list would've been more helpful if it had been otherwise sorted and re-collected as loads of clusters are not words but rather tokens. How would u consider it t be ,a wordlist or a token list?

18 AAA'S 5
19 AAAAA 3
20 AAAAAGH 3
21 AAAAAH 4
22 AAAAH 7
23 AAAARGH 2
24 AAAGH 13
25 AAAH 13
26 AAAHH 5
27 AAAI 4
28 AAARGH 7
29 AAB 2
30 AABB 2
31 AAC 3
32 AACHEN 56

xiaoz
2005-07-25, 12:49 AM
These may not be words in a conventional sense (they may be labels), they are nevertheless "words" as they do actually appear in texts. You can of course re-sort the list on the basis of frequency, for example using WordSmith Wordlist.

xusun575
2005-07-30, 12:53 PM
以下是引用 xiaoz 在 2005-7-25 0:49:48 的发言:
These may not be words in a conventional sense (they may be labels), they are nevertheless "words" as they do actually appear in texts. You can of course re-sort the list on the basis of frequency, for example using WordSmith Wordlist.


Aha but yes! As an old saying goes, anything that appears in a text should be taken as a word.-:))