查看完整版本 : how to get an idiom list?
patricx
2005-08-01, 07:27 AM
the same with the title :how to get an idiom list or phrase list through WS4?
tiger
2005-08-01, 10:03 AM
I'd like to ask the same question.
patricx
2005-08-01, 10:16 AM
it seems that nowadays few softwares can search idioms and phrases in a corpus and make an idiomlist just like the wordlist
xiaoz
2005-08-01, 10:27 AM
Try the cluster function of WordSmith. You will get a list of useful formulaic expressions if you set the a great score of MI (or other statistical measures). If you have a large large corpus, also set set a great minimum frequency (above 5 for example). See WST manual for how to do this using Concord or Wordlist. Note that the ways of doing such things are different in WST3 and WST4.
patricx
2005-08-01, 10:54 AM
i will have a try. if WS4 can identify idioms or phrases, i am sure that it should has a very large dictionary of idioms and phrases of the English. and i think this is a very big problem just like the complexities of parsing Chinese words.
xiaoz
2005-08-01, 10:57 AM
No, no dictionary of idioms is used in this kind of work. Such work is based on statistics (co-occurring frequencies, mutual information etc.).
patricx
2005-08-01, 11:09 AM
could you please introduce me any articles to read in this field?
i don't know the computer how to identify the idioms like"the dos and don'ts"die like a dog "make ends meet"....
for example, i have a couple of texts, i want to compute how many idioms/phrases in them, what are these idioms? then i can get an idiomlist which looks like a wordlist.
you have already introduced two articles:
Capturing phraseology in an online dictionary for advanced users of English as a second language: a response to user needs
Two quantitative methods of studying phraseology in English
[本贴已被 作者 于 2005年08月01日 11时18分14秒 编辑过]
xiaoz
2005-08-01, 12:01 PM
Testing your intuitions -
Do you know which 3-word-sequences are most frequently used in English conversations?
Do you know which 3-word-sequences are most frequently used in English in general?
You will find answers here shortly...
patricx
2005-08-01, 12:18 PM
i am sorry, i have no idea. and i think different answers depend on the different data computed.
[本贴已被 作者 于 2005年08月01日 12时20分11秒 编辑过]
xujiajin
2005-08-01, 01:53 PM
Pls try kfNgram.
http://www.kwicfinder.com/kfNgram/kfNgramHelp.html
xujiajin
2005-08-01, 02:01 PM
Two quantitative methods of studying phraseology
http://www.corpus4u.com/forum_view.asp?view_id=583&forum_id=34
This paper explains clusters, lexical bundles, phrases, idioms in corpus linguistics sense.
tiger
2005-08-01, 04:29 PM
I forgot to make use of MI in my search.
Thanks a lot.
xiaoz
2005-08-01, 08:05 PM
Most frequently used tri-gram in English conversations - they are not necessarily idioms in a conventional sense, but they are useful pre-fabs...
I DON'T KNOW
I DON'T THINK
DO YOU WANT
A LOT OF
WHAT DO YOU
I MEAN I
DO YOU KNOW
A BIT OF
HAVE YOU GOT
YOU KNOW WHAT
YOU HAVE TO
YOU WANT TO
MM MM MM
YOU KNOW I
AND I SAID
DON'T KNOW WHAT
HAVE A LOOK
YEAH I KNOW
YOU'VE GOT TO
I DON'T WANT
BUT I MEAN
NO NO NO
DO YOU THINK
I SAID TO
BE ABLE TO
I THINK IT'S
A COUPLE OF
IT WAS A
TO DO IT
YOU KNOW THE
NO I DON'T
THAT'S WHAT I
TO HAVE A
ONE TWO THREE
I DON'T LIKE
ONE OF THE
WHAT ARE YOU
AT THE MOMENT
AND HE SAID
I THINK IT
I THINK I
TO GO TO
WHAT I MEAN
I WANT TO
WELL I DON'T
I'VE GOT A
AND IT WAS
I'M GOING TO
ONE OF THEM
THE END OF
I SAID I
IN THE MORNING
A LITTLE BIT
TWO THREE FOUR
AND SHE SAID
DON'T WANT TO
I SAID WELL
IF YOU WANT
I TELL YOU
I MEAN YOU
I USED TO
OH I SEE
IT IN THE
ALL THE TIME
AND I THOUGHT
I'VE GOT TO
I HAVEN'T GOT
KNOW WHAT I
BUT I DON'T
WHAT IS IT
CAN I HAVE
I MEAN IT'S
YOU KNOW AND
YOU KNOW YOU
YOU KNOW THAT
ARE YOU GOING
I THOUGHT IT
THOUGHT IT WAS
I DIDN'T KNOW
WHAT DID YOU
TELL YOU WHAT
THE OTHER ONE
I WAS GONNA
GO AND GET
THERE WAS A
YOU CAN GET
SOMETHING LIKE THAT
GO TO THE
HAVE TO GO
TERMS OF THE
YOU'LL HAVE TO
ED OUCS UPDATED
END USER LICENCE
PART OF THE
AND THEN YOU
I'LL HAVE TO
I KNOW BUT
YOU CAN HAVE
PUT IT IN
I HAD TO
OUT OF THE
THE OTHER DAY
I KNOW I
ONE OF THOSE
I THOUGHT YOU
AND THEN I
TO DO WITH
AND A HALF
YOU'VE GOT A
IN A MINUTE
I HAVE TO
WHEN I WAS
WELL I THINK
LOOK AT THAT
TO GO AND
USED TO BE
LOOK AT THE
patricx
2005-08-01, 08:13 PM
thanks Richard! but what do we use these frequent expressions for? it's not an easy job to identify the idioms , i think.
xiaoz
2005-08-01, 08:39 PM
Language - especially spoken language - can be learnt not word by word, but pre-fab by pre-fab. That speeds up processing and improves fluency.
In statistically based lists, frequently used idioms are definitely covered (a little bit, a lot of, be able to, etc), but such lists include many more useful items (well I think, oh I see, but I mean, etc).
If you want a list of conventional idioms, a dictionary might be better. But some idioms may fall out of use gradually (rain cats and dogs, hen-pecked husband) whille new items become popular - a life circle.
I found the lists extracted using Michael Barlow's Collocate are more "idiom-like" than those from WST. Or even better is IdomPrinciple developed at Birmingham, which is for in-house use.
patricx
2005-08-01, 08:50 PM
it seems i should come back to the dictionary again, thanks Richard.
以下是引用 xiaoz 在 2005-8-1 20:39:06 的发言:
I found the lists extracted using Michael Barlow's Collocate are more "idiom-like" than those from WST. Or even better is IdomPrinciple developed at Birmingham, which is for in-house use.
That's interesting. Never compared them in this area. I wonder why
this would happen given that they are both string frequency based
calculations.
Another term for the pre-fabs listed above is 'lexical bundles',
which Biber and his associates use a lot (e..g. Longman Grammar).
[本贴已被 作者 于 2005年08月04日 02时08分50秒 编辑过]
Another thing: from a pure mechanical point of view, this is
one of the places where one is better off using tags to generate
a list of interest, if the quick clusters/bundles don't foot the bill.
tiger
2005-08-06, 12:04 PM
以下是引用 xiaoz 在 2005-8-1 10:27:37 的发言:
Try the cluster function of WordSmith. You will get a list of useful formulaic expressions if you set the a great score of MI (or other statistical measures). If you have a large large corpus, also set set a great minimum frequency (above 5 for example). See WST manual for how to do this using Concord or Wordlist. Note that the ways of doing such things are different in WST3 and WST4.
I know how to search for clusters with a minimum frequency, but how can I get a cluster list by setting a MI score with wordsmith 3?
[本贴已被 作者 于 2005年08月06日 12时29分13秒 编辑过]
[本贴已被 作者 于 2005年08月06日 14时12分48秒 编辑过]
xiaoz
2005-08-06, 07:53 PM
Adjust settings for Index.
tiger
2005-08-07, 09:25 AM
I see. Thank you.
vBulletin® v3.7.4,版权所有 ©2000-2009,Jelsoft Enterprises Ltd.