Parallel image text corpus of Chinglish 开心译站

xujiajin

管理员
Staff member
Parallel image text corpus of Chinglish
今天走在路上突然想起可以运用wikipedia的概念,由大家共同来完成的一个供大家免费使用的Chinglish的图片配文字的语料库。
1、语料收集目标定位于双语标牌、广告、产品说明、字幕说明等等。不论对错,一律照收。

2、要求包含图片文件和文本文件。图片和文本使用同一文件名,比如yonghegong.jpg对应yonghegong.txt。建议统一用汉语拼音全拼来命名文件。
3、大家上传以后我们负责整理,然后再贴出来。
4、txt文件中应当包含的文字
<header>
<place>Near the entrance of Beijing Yonghe Gong</place>
<date>2005-04-02</date>
<creator>xujiajin</creator>
<contact>ustcxujj@sina.com</contact>
</header>

<text ID="yonghegong">
<p language="Chinese">参观雍和宫向南150米</p>
<p language="English">Visiting Yonghe Gong towards south 150 metres</p>
<p language="Chinese">严禁携带超长香进入雍和宫</p>
</text>
2005080814154994.jpg
 
Last edited:
<header>
<place>Lobby of Beijing Airport Garden Hotel</place>
<date>2005-01-07</date>
<creator>xujiajin</creator>
<contact>ustcxujj@sina.com</contact>
</header>

<text ID="yonghegong">
<p language="Chinese">贵重物品请在行李部寄存</p>
<p language="English">Valuble goods should be stored in luggage depar(t)ment</p>
</text>
http://www.corpus4u.org/upload/forum/2005080814230715.txt

2005080814312225.jpg
 
I do hope this continuous collaborative attempt would come up with a usable parallel corpus.
 
回复:Parallel image text corpus of Chinglish

It is really a good idea. However, I also want to know how you make it work. Where will you store the parallel image text copus? You will not store all the things on this website, won't you? And how shall we search for the specific thing I want from the accumulated stuff later on?
 
We will manage to categorize the texts and images and merge all the texts into one big file for download which we searchable of course.

Since the image files have corresponding filenames to the texts, we can easily locate the corresponding images as well. We will compress the images into one too for download.

Any other good idea for the usability of the corpus?
 
回复:Parallel image text corpus of Chinglish

If nothing else, this would be a fun collection for sure!
 
It will be a great idea to compile this type of corpus for the benefit of some Chinese translators.
 
Parallel image text corpus of Chinglish

一则垃圾邮件广告:变废为宝 --

            -高楼小吊车-

 高楼小吊机是诞生于世纪之初的专利产品,它体积小、重量轻,
一个人自行车就能携带;它吊的高、速度快,30层楼房8分钟就能打来回;它便于安
装,起重大,容易操作效率高。它的问世是当今建筑装潢行业机械上料的重大突破,
从此将结束高空装潢材料依靠人工搬运的历史。它为劳动者极大的减轻了劳动强度,
最大地创造了经济效益;它投资小、见效快、零风险、高回报。高楼小吊机采用220
伏单相电源,使用极为方便,是装潢公司、安装公司、搬家公司以及搬运工人的实用
小型机械,又是大型建筑工程代替塔吊、龙门吊的新型设备。它能用于机械厂、家电
厂、食品厂的生产装配线,也能用于楼顶防水混凝土现浇工程。它可做为修理部门、仓
储库房的新型工具,也可进入家庭吊运物品、存放水果、晒粮等等。

      small investment high profit
“Portable Crane” is patent which is born in beginning of century.
It is small in bulk, low in weight, and a person can carry it by
bike. It has high-hang, high-speed and promote goods to 30 floors
back and forth just taking 8minutes. Also it is easy in arrange,
heavy in hold, easy in operate and get high ability. Its appearance
is a grand burst of upholster building, so it enormous lessen the
carriers' working strength. It has small-investment, fast-act,
zero-risk and high-profit.
“Portable Crane” used 220v unidirectional power source, so it is
mini-machine by company of upholster, arrange, movement and carriers.
And it is also replace the new machine: “Long Men” crane or “Tower”
crane! It can be used in produce assembler of mechanical factory, white
goods-factory and food factory, also can used in damp proof and
concrete. Look on as new tool of repair department and storage, the
“Portable Crane” can lift goods, lay up fruits, shine upon breadstuffs
and so on.
 
Back
顶部