本文由 xujiajin 在 2012-10-20 发表於 "多语种语料库" 讨论区
The tool is quite simple and intuitive, I don't think any help or manual is needed to get it work.
I tried the software. I did not work!
It is not as easy as it looks! I used two test files.
One source txt file is like this:
One target txt file is like this:
I loaded the English txt file to corpus 1; loaded the Chinese txt file to corpus 2.
Then I searched for "read", zero hit!
What is a line break anyway?! I heard it is "\".
"Building a parallel corpus (from two or more aligned raw text files)
Step 1: Click on the File->Build/Edit menu option. The "Corpus
Builder" dialog box will appear.
Step 2: Select "Corpus 1" and load your first raw text file. Each line
of this file should be aligned with each line of your other
raw text files. (A 'line' is a string of text with a line break at
Step 3: Click on the "Display Name" entry box, and give your first
corpus an appropriate name (e.g. English Corpus, Target
Step 4: Click on "Corpus 2" and repeat steps 2 and 3
Step 5: Click 'Update Corpus' to build an internal database of the
parallel corpus that you have built."