搜寻结果

  1. I

    什么是R software?

    回复: 什么是R software? R在数据挖掘领域(含文本挖掘)锋芒毕露,不可小视,2012成了各种语言中的领军人物。统计数据参见 http://www.kdnuggets.com/2012/05/top-analytics-data-mining-big-data-software.html
  2. I

    R自然语言处理模块

    回复: R自然语言处理模块 R很耀眼啊,还可以利用包的功能直接进行中文分词,值得好好学学。 R里常用的中文分词包有rmmseg4j (基于Maximum Matching算法), rsmartcn (ICTCLAS简化版本,去除了词性标注功能) ,rpaoing (Java-based). 1. rsmartcn wget http://download.r-forge.r-project.org/src/contrib/rsmartcn_0.1-0.tar.gz RStudio ->Tools -> Install Packages (install...
  3. I

    R自然语言处理模块

    回复: R自然语言处理模块 嗯,试了WordNet,很不错。下面是我在Ubuntu下的测试结果: 1. 安装 WordNet: sudo apt-get install wordnet 2. R> install.packages("wordnet") 3. R> library(wordnet) R> synonyms("friend","NOUN") [1] "acquaintance" "admirer" "ally" "booster" "champion" "friend" [7]...
  4. I

    Apache OpenNLP

    回复: Apache OpenNLP OpenNLP也可以用于R (需要Java支持) 下面是我在Ubuntu下的测试结果: 1. R> install.packages(c("rJava") #如果出现 package ‘rJava’ 不能安装的情况,sudo apt-get install openjdk-6-jdk 2. R> install.packages(c("openNLP","openNLPmodels.en")) 3. 假设有nlp.txt 文件存在: I would like to test the new package RGG (R...
  5. I

    语料库语言学的一家之言(转)

    回复: 语料库语言学的一家之言(转) I think the debate on meaning/frequency is not necessarily irreconcilable. It should be not too absurd to assert that meaning is closely related to frequency (if we only look at the moment when we define we are in love). If we take the viewpoint that meaning is essentially a...
  6. I

    语料库语言学的一家之言(转)

    回复: 语料库语言学的一家之言(转) Statistical MT practioners rely very little on rules,and alas, they happen to be the most succeful people in the MT field so far. But corpus linguistics do have a role to play in computer-aided human translation. So the news is not too bad. To me, not only information we...
  7. I

    Cornell Movie-Dialogs Corpus (English)

    回复: Cornell Movie-Dialogs Corpus (English) http://www.cs.cornell.edu/~cristian/Chameleons_in_imagined_conversations.html 上有个链接 [Featured on Nature.com] ,是作为新闻报道的
  8. I

    Cornell Movie-Dialogs Corpus (English)

    回复: Cornell Movie-Dialogs Corpus (English) 谢谢推荐,研究结果还在Nature上报道了,看来研究的问题要问得足够有趣
  9. I

    书店可以买到的语料库相关书籍汇总Corpus books at a glance

    回复: 书店可以买到的语料库相关书籍汇总Corpus books at a glance 一本正则表达式的新书,有一定深度,对中文处理兼顾较好: 余晟. 2012. 正则指引. 电子工业出版社.
  10. I

    Yacsi: Another ICTCLAS 2012 GUI

    回复: Yacsi: Another ICTCLAS 2012 GUI 以”彭谨向记者表示,此事应当由警方调查,先入为主认定路边烤肉有问题不太合适。“这句话为例,计算所一级标注为: 彭谨/n 向/p 记者/n 表示/v ,/w 此事/r 应当/v 由/p 警方/n 调查/v ,/w 先入为主/v 认定/v 路边/s 烤肉/n 有/v 问题/n 不/d 太/d 合适/a 。/w 而计算所二级标注为: 彭谨/nr 向/p 记者/n 表示/v ,/wd 此事/r 应当/v 由/p 警方/n 调查/vn ,/wd 先入为主/vl 认定/v 路边/s 烤肉/n 有/vyou 问题/n...
  11. I

    New Release: StringNet 3.0 at nav.stringnet.org

    回复: New Release: StringNet 3.0 at nav.stringnet.org Pretty good. Could be a help for ESL students.
  12. I

    Yacsi: Another ICTCLAS 2012 GUI

    回复: Re: Yacsi: Another ICTCLAS 2012 GUI YACSI是在中文windows XP下用VC6开发的,不知道能否直接支持英文(或其他文种)Windows系统,欢迎测试。在非中文环境下,如果不能使用,可能需要在控制面板的"区域和语言选项"进行设置,添加对CP936的支持。
  13. I

    Yacsi: Another ICTCLAS 2012 GUI

    回复: Yacsi: Another ICTCLAS 2012 GUI Thanks for your interest. But I can't remember YACSI can generate a "character list". Can you give me more detail regarding your problem? Also YACSI is not intended to be all-inclusive. To make a frequency list of "words", there are many good tools out...
  14. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 An anecdote is that before I was sure the .dat file is encypted like I had guessed. I did a more laborous work. I used UltraEdit to look at the hex number of COLEN.dat and made a word list of all hex numbers (yes, with Antconc); then I made a...
  15. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 Originally, I planned to use a database as the backend to make a tool like SC, but I dropped the idea very soon: When most of the functionality is already there in SC, why do you bother to reinvent the wheel? Isn't all good science build upon the...
  16. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 Data structures lie at the heart of any significant programming. For SC, it is the index files. Unfortunately, it is encrypted. If you open COLEN.dat, you will read gibberish, something like Chinese. But the search results tell us that the...
  17. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 Dr. Xu, thanks for your generosity and permission. Please understand that it is out of love of Sentence Collector that I peeped into the nooks and crannies of it. Great news for us all. ------------------------- Next I will talk about my...
  18. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 I will, however, only with permission from Dr. Xu and Mr. Jia. Programming is time-consuming and labor-intensive, it would be offensive to disclose anything against their wishes.
  19. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 NB: If you want to use the COLEN corpus, please set COLEN.idx=1 in the index_list.ini file. If you want to turn the Europarl09 corpus off, please set Europarl09.idx=0 in the index_list.ini file.
  20. I

    BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

    回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具 hi all, I've managed to create an index file compatible with BFSU Sentence Collector 1.0. The original English text is taken from part of the 2009 data of the Europarl parallel corpus. The index file is created to facilitate better use of...
Back
顶部