用AntConc处理中文concordance, wordlist, N-gram

回复:用AntConc处理中文concordance, wordlist, N-gram

是啊, AntConc从1.0到3.0, 从丑小鸭变成了白天鹅,赞一个。感谢所有为此作出贡献的人。
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 seanxpq2006-3-30 13:52:42 的发言:
呵呵,实在忍不住了要说:大家终于对俺大力推荐过的软件ANTCONC感兴趣了吧。
C友共商真情在,淘尽黄沙始见金
 
小和两句。

蚍蜉可堪词匠职,集腋终得成大势。

注:蚍蜉,即蚂蚁,借指Ant(AntConc的中心词)。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

同时也希望大家通过Corpus4u这个空间,“茹切如搓,如琢如磨:)”,集腋成裘,聚沙成塔,推动语料库研究在中国的发展。
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin2006-3-30 19:22:07 的发言:
小和两句。

蚍蜉可堪词匠职,集腋终得成大势。

注:蚍蜉,即蚂蚁,借指Ant(AntConc的中心词)。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

同时也希望大家通过Corpus4u这个空间,“茹切如搓,如琢如磨:)”,集腋成裘,聚沙成塔,推动语料库研究在中国的发展。
[emb2][emb2][emb2][emb6]
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin2006-3-30 19:22:07 的发言:
小和两句。

蚍蜉可堪词匠职,集腋终得成大势。

注:蚍蜉,即蚂蚁,借指Ant(AntConc的中心词)。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

同时也希望大家通过Corpus4u这个空间,“茹切如搓,如琢如磨:)”,集腋成裘,聚沙成塔,推动语料库研究在中国的发展。


赞一个!
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 laohong2006-3-30 11:55:18 的发言:
以下是引用 动态语法2006-3-29 15:28:58 的发言:
... I have had numerous discussions with him about code names; apparently this is the best that can be done at this point...

Basically, my test showed that this tiny program works very well with Chinese texts, though it is a pity that the concordances of KWIC are not nicely presented. Can you also ask him to add an option in saving the concordance result? Something similar as Wconcord's "Save with delimiters":
2006033011454810.jpg


With the delimiters saved, the concordance result looks as follow:
2006033011465757.jpg


Then we can make use of regular expression to replace all "|" with a Tab, and replace "[" with a Tab and "[". The result then can be opened with Excel in three columns. Resort in Excle is of course quite easy.
2006033011534855.jpg
[/quote]

So my understanding is that you want some characters there in the result file to work with with
a GREP program and eventually be able to export the result to Excel. I asked him to make it
possible to center the search term in the line, which he said could be done easily. If this
happens I think it would work for your need. That is, if the search term is centered
there is usally a tab character before and after the search term, so you don't need the
| -> TAB replacement process. You could still use a GREP program to replace the sequence
'TAB SEARCH_TERM TAB' with whatever you want to replace and export
the data to whatever program you want to export. As far as I can tell, having the result in
a fixed format (e.g. TAB SEARCH_TERM TAB), a lot of things can be made possible.
(With regard to the [ ] characters, that's even easier to replace with any 'search and replace'
mechanisms.)

A little bit of history: the multilingual/UNICODE capability was added in v. 3.0. Now 3.1. is
vastly better than 3.0 but it's still a bit confusing as far as the encoding names.
 
A problem found when doing a multiple search words concordancing:
When I used "the|a|an" as the search term:
2006033114240419.jpg


"idea" and "and" also showed up in the search results.
2006033114243378.jpg
 
回复:用AntConc处理中文concordance, wordlist, N-gram

you best bet is to take the WordList -> Concordance route:

Go to
Tool preferences -> WordList Pref. -> select
"Use Specific Words Listed Below", input through either Add Words or From File.
You"ll get the word list information, and click on any word on the list you will see the
corresponding concordance lines.
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin2006-3-31 14:22:57 的发言:
A problem found when doing a multiple search words concordancing:
When I used "the|a|an" as the search term:
2006033114240419.jpg


"idea" and "and" also showed up in the search results.
2006033114243378.jpg
Ablank will make a difference. Try again by changing your search terms "the|a |an" to "the |a |an " ,that is, there should be blank after each of your search terms.
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 动态语法2006-3-31 14:38:31 的发言:
you best bet is to take the WordList -> Concordance route:
Go to
Tool preferences -> WordList Pref. -> select
"Use Specific Words Listed Below", input through either Add Words or From File.
You"ll get the word list information, and click on any word on the list you will see the
corresponding concordance lines.

Yeah. This route, a detour tough, leads to concordances of multiple search terms. In this case, Wordlist serves as a pointer to Concordance. In file-based concordancing results, however, search words of a category can be viewed in one window, not in separate ones.
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin2006-3-30 19:22:07 的发言:
小和两句。

蚍蜉可堪词匠职,集腋终得成大势。

注:蚍蜉,即蚂蚁,借指Ant(AntConc的中心词)。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。



感谢 Laurence Anthony, 同时也要感谢将这个优秀软件介绍给国内C友的各位热心朋友!
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 动态语法2006-3-31 1:47:48 的发言:
So my understanding is that you want some characters there in the result file to work with with a GREP program and eventually be able to export the result to Excel. I asked him to make it possible to center the search term in the line, which he said could be done easily. If this happens I think it would work for your need. That is, if the search term is centered there is usally a tab character before and after the search term, so you don't need the | -> TAB replacement process. You could still use a GREP program to replace the sequence 'TAB SEARCH_TERM TAB' with whatever you want to replace and export the data to whatever program you want to export. As far as I can tell, having the result in a fixed format (e.g. TAB SEARCH_TERM TAB), a lot of things can be made possible.

Yes, you are right, I think it's better to have TABs inserted: 'TAB SEARCH_TERM TAB' . It's good to have a lot of things made possible with it.
 
回复:用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xusun5752006-3-29 12:04:04 的发言:
Laohong啊,中文分词你用的是什么宝贝啊?

前面的例子是用ICTCLAS分词的。请搜索本站找相关的自动分词和词性标注工具:SegTag、ICTCLAS、NEUCSP、Hylanda、WinAT等。
 
奇怪了,AntConc在我的电脑上怎么就不能运行呢?双击.exe的图标,没有反应。那位大侠能帮俺解答一下啊?另外,我的另一台电脑能运行3.1.2.0的版本,可3.1.302又遇到同样的问题。怎么回事呢?
 
Back
顶部