PDA

查看完整版本 : [推荐] Yoshikoder: A free content analysis software


yinghuang
2006-07-05, 10:32 PM
Yoshikoder is a cross-platform multilingual content analysis program developed as part of the Identity Project at Harvard's Center for International Affairs.
Yoshikoder allows you to load documents, construct and apply content analysis dictionaries, examine keywords-in-context, and perform basic content analyses, in any language.

In more detail: Yoshikoder works with text documents, whether in plain ASCII, Unicode (e.g. UTF-8), or a national encodings (e.g. Big5 Chinese.) You can construct, view, and save keywords-in-context. You can write content analysis dictionaries can be constructed using PERL-style regular expressions. Yoshikoder provides summaries of documents, either as word frequency tables or according to a content analysis dictionary. You can also compare documents according to word frequency profile or with respect to a content dictionary. Yoshikoder's native file format is XML, so dictionaries and keyword-in-context files are non-proprietary and human readable.

Yoshikoder is open-source software, released under the Gnu Public License. This licensing implies, among other things, that Yoshikoder is free for academic use.

http://people.iq.harvard.edu/~wlowe/CCA.html

[本贴已被 动态语法 于 2006年07月05日 22时40分54秒 编辑过]

tiger
2006-07-05, 11:28 PM
3q

seanxpq
2006-07-06, 10:22 AM
不错。谢谢!

xujiajin
2006-07-06, 08:57 PM
It looks interesting, but I don't know how to get started with the tool.

yinghuang
2006-07-06, 10:33 PM
How to get it started:
(1) Download and install windows excutable Yoshikoder at http://people.iq.harvard.edu/~wlowe/CCA.html.
(2) Download and install J2SE SDK also at this website, if you havent installed it.
(3) Run it.
(4) Open document, as shown below.
http://forum.corpus4u.org/upload/forum/2006070622250313.gif
(5) Set encoding format (UTF8, or Big5, that depends) and font style for Chinese, as illustrated below.
http://forum.corpus4u.org/upload/forum/2006070622285382.gif
(6) Click Report: on document, as displayed below.
http://forum.corpus4u.org/upload/forum/2006070622323631.gif
(7) See the reported results.
http://forum.corpus4u.org/upload/forum/2006070622353173.gif
(8) Enjoy the results.


[本贴已被 作者 于 2006年07月06日 22时42分19秒 编辑过]

xujiajin
2006-07-07, 08:56 AM
Thank you for the screen shots.
The concordance part is not its best part.
What I am interested in is its so-called "content analysis" (as I see it--semantic analysis and beyond).

tiger
2006-07-08, 10:59 PM
the same question i'd like to ask.

yinghuang
2006-07-09, 01:33 AM
I guess the ware cannot do semantic analysis and beyond as you expect. For an overview of content analysis esp. concerning what software can do, see Stempler (2001) at the site: http://pareonline.net/getvn.asp?v=7&n=17.