搜寻结果

关于LOCNEC，SWECCL请教

I assume that you mean LOCNESS. You could search this forum for related posts. BTW, LOCNESS is a corpus of written English.
- ArthurW
- Post #3
- 2016-12-10
- 论坛: 口语语料库
LancsBox: graphical collocation analysis

http://corpora.lancs.ac.uk/lancsbox/ LancsBox是2015年由Lancaster University的Vaclav Brezina等开发的基于Java的工具，用于从语料中提取特定词的搭配，按搭配强度以动态图形方式显示出来，并可继续显示其中的某些搭配词的搭配词，这样方便观察二者的共同搭配，非常直观生动。该工具同时也包含了一个比较简单的KWIC检索。
- ArthurW
- 主题
- 2016-12-10
- collocation analysis lancsbox
- 回覆: 0
- 论坛: 编程与工具开发
如何用用BNC语料库的部分原文制作成参照语料库

You can go to https://cqpweb.lancs.ac.uk/ and export a specified part of BNC.
- ArthurW
- Post #3
- 2016-12-10
- 论坛: 语料库研究习作
求助：搜索动词转化为名词这类词汇的正则表达式

词性标注工具只标表面的词性，不可能标转换过程，过程是隐性的
- ArthurW
- Post #6
- 2016-12-10
- 论坛: 语料库研究习作
语料库检索

语料库工具只是计算机程序，对语言一窍不通，只能通过语言表面形式来检索。隐喻则是语义层面的，与形式没有明确对应关系，所以要用语料库工具来检索隐喻原则上办不到。建议做文献检索，看人们对用自动方法识别隐喻表达的诸多看法。
- ArthurW
- Post #2
- 2016-12-10
- 论坛: 语料库检索
SegmentAnt 1.10 (三种中文分词+用户字典）

今年十月份我跟Laurence确认了这个bug，他迅速发布了1.1.1版，此问题已解决。
- ArthurW
- Post #5
- 2016-12-10
- 论坛: 语料库标注
古汉语典籍的语料处理问题

asking说得对，古汉语没有可用的分词工具，目前的主要方法是以字为单位字间加空格。不过古文中的字多义和歧义现象太多，不好计算TTR，勉强计算意义非常有限。至于译文的排比结构与句式有关，要看定义为句子层面还是短语层面。如果是句层面可先做sentence split，然后再选择性地找某些排比结构。另外估计你想将原文与译文相对照检索，这就需要对齐了。建议看下AntPConc软件
- ArthurW
- Post #5
- 2016-12-10
- 论坛: 汉语语料库
手头的BNC语料库全是XML格式，请问应该怎样正常使用呢？

sourceforge上可以下载xaira-1.26版的msi安装文件。检索肯定离不开index，索引文档非常复杂，也很占空间。不过为了检索BNC还是值得的。另外也可以上cqpweb.lancs.ac.uk 来检索
- ArthurW
- Post #8
- 2016-12-10
- 论坛: 常见问题
[语料发布] Tibetan Folk Tales Corpus

TIBETAN FOLK TALES CORPUS Source: http://www.sacred-texts.com/asia/tft Compiler: Jiayue Wang Time: 9 December 2016 The texts were extracted from web pages downloaded from the website. Each line that begins with a hashtag (#) indicates the webpage and its relative path in the...
- ArthurW
- 主题
- 2016-12-09
- english corpus tibetan folk tales corpus
- 回覆: 0
- 论坛: 语料汇集
分析大量的日语文本请问用什么软件好

如armstrong所说，AntConc经过一点设置可以直接检索未分词的汉语和日语文本。MLCT (Multilingual Corpus Tool)也可以。仅检索其实不难，其他就要看想怎么分析了
- ArthurW
- Post #6
- 2016-12-09
- 论坛: 语料库与语言研究
求语料库相关练习

这本教材里每章后面有很好的练习 McEnery, T. and Hardie, A. 2011. Corpus linguistics : method, theory and practice. Cambridge: CUP.
- ArthurW
- Post #2
- 2016-12-09
- 论坛: 语料库语言学入门
[语料发布] Buddhist Sacred-Texts Corpus

Copyright note about the open-domain texts at sacred-texts.com: http://www.sacred-texts.com/cnote.htm
- ArthurW
- Post #2
- 2016-12-09
- 论坛: 语料汇集
[语料发布] Buddhist Sacred-Texts Corpus

BUDDHIST SACRED TEXTS CORPUS Source: http://www.sacred-texts.com/bud Compiler: Jiayue Wang Time: 8 December 2016 The texts were extracted from web pages downloaded from the website. Each line that begins with a hashtag (#) indicates the webpage and its relative path in the...
- ArthurW
- 主题
- 2016-12-09
- buddhist texts english corpus raw text
- 回覆: 1
- 论坛: 语料汇集
术语，特殊符号的标注问题

你要标的是词还是短语？如果是短语层面，CLAWS或TreeTagger做不到，需要用到语义分析了。其实多数分析器是基于词表和术语表的，计算机要先知道哪些是术语才能正确标出来。NLP界现在很多人在研究multiword expressions (MWEs)
- ArthurW
- Post #2
- 2016-12-08
- 论坛: 语料库标注
有佛学英文语料库吗？

Hi, Go to this url http://www.sacred-texts.com/bud/ and you can build one for yourself
- ArthurW
- Post #2
- 2016-12-08
- 论坛: 语料汇集
Call for a social media corpora

Hi Kayee, Lawrence Anthony and Claire Hardaker's tool FireAnt can be used to extract and analyse Twitter stuff. Check this out: http://www.laurenceanthony.net/software/fireant/
- ArthurW
- Post #4
- 2016-12-08
- 论坛: 语料库与语言研究
怎样用antconc软件对我研究的语料生成词表

据我所知还不存在古代汉语的分词系统
- ArthurW
- Post #7
- 2016-11-29
- 论坛: 语料库与外语教学
jusTest - 去除网页中的boilerplate

从网页中提取文本制作语料库时会发现网页中包含大量重复性内容，如copyright, ads, headers, footers等等。这些显然不是这类语料库使用者所关心的内容。我近期就遇到这个问题。由于建设这类语料库要处理的网页数量往往特别多，手工删除这些并不现实。推荐使用jusText工具，是用python实现的，有效去除语料中的多余杂质，保持肌肤活力。 http://corpus.tools/wiki/Justext Quick start wget -O page.html http://planet.python.org/ justext -s English...
- ArthurW
- 主题
- 2016-11-18
- 回覆: 0
- 论坛: 编程与工具开发
如何获得国外的语料库

cqpweb.lancs.ac.uk 这里有很多语料库
- ArthurW
- Post #12
- 2016-11-18
- 论坛: 中国英语学习者语料库
求助各位前辈大神AntConc的使用

I don't know of a good tool out there except MS Excel, which can do the sampling quite well on the premise that this function of Excel is installed. (Search the Internet for methods to do this.) To my knowledge AntConc can't do this sampling for you.
- ArthurW
- Post #2
- 2016-11-07
- 论坛: 常见问题

Home
搜索