本文由 xudekuan 在 2006-08-25 发表於 "多语种语料库" 讨论区
10.3 Heading search --- NOT IMPLEMENTED
Some corpora (such as the British National Corpus) are structured in such a
way that they are divided into a series of headings and texts. The heading,
which contains information about a text, is followed by the text itself. The
HEADING SEARCH enables the user to select which texts are to be searched,
based on the presence of particular specifications in the heading. Thus if the
headings contain an annotation about the sex of the authors, then it should
be possible to search only those texts written by female authors. For
example, the pattern entered in the heading search might be author=fem.
To perform a heading search, you first check the HEADINGS/CONTEXTS box
(towards the bottom of the ADVANCED SEARCH dialogue box) and then
choose EDIT. In the HEADINGS/CONTEXTS search dialogue box that arises,
choose EDIT again. A HEADINGS CONFIGURATION dialogue box appears in which
you can ADD (i) a heading name, (ii) the begin and end tags that delimit the
heading, and (iii) the tags that indicate the beginning and ending of the text
Once the HEADINGS CONFIGURATION has been specified, the PATTERN to be
searched for in the heading (e.g., author=fem) can be entered in the
HEADINGS/CONTEXTS search dialogue box. All the heading information is now
in place and so it just remains to enter the search query itself.
This is another kind of context search. It allows us to search for a word (say
victory) used in texts written by female authors. Once the headings have
been specified as above, a search can be initiated. The program locates the
heading and looks for the pattern specified (e.g., author=fem). If the pattern
is found, then the text following the heading is searched for the string
specified in the search text box (e.g., victory). If the pattern author=fem is
not found in the header, then the program ignores the corresponding body
of text and checks the next header for the presence of author=fem, and so
on until the entire corpus has been searched.
I did follow the directives several times, but could get the results of searching.
Have you managed to get results?
I didn't try. But did you see the words NOT IMPLEMENTED. I guess this function might not be enabled in the current version.
More questions concerning the use of Paraconc: I was puzzled that Paraconc could not identify my own tagged markers, such <S> and </S>, <P> and </P>. I had changed the Tag setting in the File Menu. But it could not work. Whether the texts that can be processed by Paraconc need to be sententially aligned clean texts? And the alignment at the sentential level needs a lot of hand-editing. Any suggestions for this? According to Xie Jiacheng's article, he had achieved the alighnment at the level of paragraph. His taggers are <P> and </P>. But I can only use the delimited segments. I have tried to add space between the characters and use the <seg> and </seg> markings.
Pls paste some of your texts here for diagnosis. And pls also check your character encoding.
Which version of paraconc did you use, build269 or the demo version?
You may need read my earlier post regarding this question at:
Thanks for Dr. Xu and Dr. Hong. Unfortunately my version is Paraconc-BETA (Version 1.0 Build 233). 2003.5.28 22:30:30. Is it the demo version?
The sample texts are as follows:
<CHAPTER21>第 二 十 一 章 财 主 小 姐 引 起 的 争 吵 </CHAPTER21>
<P> <S>一 个 女 孩 子 有 了 施 瓦 滋 小 姐 一 般 的 能 耐 ，谁 能 够 不 爱 呢 ？奥 斯 本 老 先 生 心 里 有 个 贪 高 好
胜 的 梦 想 ，全 得 靠 她 才 能 实 现 。</S> <S>他 拿 出 十 二 分 的 热 忱 ，和 颜 悦 色 的 鼓 励 女 儿 们 和 年 轻 女 财 主
交 朋 友 。</S> <S>他 说 做 父 亲 的 看 见 女 儿 交 了 那 么 合 适 的 朋 友 ，真 从 心 里 喜 欢 出 来 。</S> </P>
<P> <S>他 对 萝 达 小 姐 说 ：“亲 爱 的 小 姐 ，你 一 向 看 惯 伦 敦 西 城 贵 族 人 家 的 势 派 ，他 们 排 场 大 ，
品 级 高 ，我 们 住 在 勒 塞 尔 广 场 的 人 家 寒 薄 得 很 ，不 能 跟 他 们 比 。</S> <S>我 的 两 个 女 儿 是 粗 人 ，不 过
不 贪 小 便 宜 ，心 倒 是 好 的 。</S> <S>她 们 对 你 的 交 情 很 深 ，这 是 她 们 的 光 彩 ——嗳 ，她 们 的 光 彩 。</S> <S>我
自 己 呢 ，也 是 个 直 心 直 肠 子 ，本 本 分 分 的 买 卖 人 。</S> <S>我 人 是 老 实 的 ，令 尊 生 前 商 业 上 的 朋 友 ，
赫 尔 格 和 白 洛 克 ，也 是 我 的 朋 友 ，我 一 向 很 尊 敬 他 们 ；对 于 我 的 为 人 ，这 两 位 可 以 保 证 的 。</S> <S>
我 们 家 里 全 是 实 心 眼 儿 ，倒 也 能 够 相 亲 相 爱 ，和 气 过 日 子 ，算 得 上 有 体 统 的 人 家 。</S> <S>你 来 看 看
就 知 道 了 。</S> <S>我 们 都 是 粗 人 ，吃 的 也 是 粗 茶 淡 饭 ，不 过 倒 是 真 心 的 欢 迎 你 来 ，亲 爱 的 萝 达 小
姐 ，——请 让 我 叫 你 萝 达 ，因 为 我 满 心 里 真 喜 欢 你 ，真 的 ！我 是 直 爽 人 ，老 实 告 诉 你 ，我 喜
欢 你 。</S> <S>拿 杯 香 槟 来 ！赫 格 斯 ，跟 施 瓦 滋 小 姐 斟 杯 香 槟 。”</S> </P>
<P> <S>Love may be felt for any young lady endowed with such qualities as Miss Swartz possessed; and a great dream of ambition entered into old Mr. Osborne's soul, which she was to realize.</S> <S>He encouraged, with the utmost enthusiasm and friendliness, his daughters' amiable attachment to the young heiress, and protested that it gave him the sincerest pleasure as a father to see the love of his girls so well disposed.</S> </P>
<P> <S>"You won't find," he would say to Miss Rhoda, "that splendour and rank to which you are accustomed at the West End, my dear Miss, at our humble mansion in Russell Square.</S> <S>My daughters are plain, disinterested girls, but their hearts are in the right place, and they've conceived an attachment for you which does them honour--I say, which does them honour.</S> <S>I'm a plain, simple, humble British merchant--an honest one, as my respected friends Hulker and Bullock will vouch, who were the correspondents of your late lamented father.</S> <S>You'll find us a united, simple, happy, and I think I may say respected, family--a plain table, a plain people, but a warm welcome, my dear Miss Rhoda--Rhoda, let me say, for my heart warms to you, it does really.</S> <S>I'm a frank man, and I like you.</S> <S>A glass of Champagne!</S> <S>Hicks, Champagne to Miss Swartz."</S> </P>
I forget to paste the chapter title of the English Version. Sorry.
<CHAPTER21>CHAPTER XXI A Quarrel About an Heiress</CHAPTER21>
Several problems in your files:
1. You should not use two or more tags there to confuse the tool, you'd better use one tag as the sentence marker. So, please remove <P> and </P> from your texts. If you do want to leave that info there, put them after <S> and before </S>.
2. Each of your Chinese sentences and English sentences should be in a seperate line. That is, one sentence one line, starting with <S> and ending with </S>;
3. The total numbers of Chinese sentences and English sentences should be the same (otherwise it will mismatch in alignment or there will be some sentences left un-aligned). However, including Chapter title, you have 13 sentences in the Chinese text and only 10 in the English text.
I reformated your two texts here, and it's quite OK to search with ParaConc (no matter which version). You may want to look at the reformated texts attached.
Thanks, Dr. Hong. But I have tried the two files. After I have loaded them, I could only use the Delimited Segments. If I choose the Align format as Start/Stop Tags, and choose to configure the start and stop tags as S and /S, it reports "Access violation at address 00476 in module 'Conc.exe'. Read of address FFFFFFFF8" two times and the two files cannot be loaded. Sorry to trouble you.
You have to type in the FULL start and stop tags in the Align Format Start/Stop tags Options area, that is, Start tag: <S>, Stop Tag: </S>. By the way, leave blank for the Attribute text box. It's wrong to only type in S and /S.
Try with the files I provided in last reply.
Thanks, Dr. Hong. I have tried other versions. It works well with my own tagged files.
Glad that you've managed to solve your own problems!
My version is Build 629. The original Build 233 doesn't work well. Build 233 could only process delimited segments, because the default in the search option includes "<", ">", "/" and "\". They will make the progrmamme report mistakes. Just delete them. If you want to keep them, when lauch the programme next time, you can return to the original state by choosing the "Return to default" option.
I have tried your two texts. It only shows the first two English sentences.
The procedures are as follows:
Load two texts
Align format: Start/stop Tags
Start Tag: <seg>
Stop Tag: </seg> Press OK twice
Text Search Window:
Enter pattern to search for: <mzy>*</mzy>
Click the display menu in the toolbar and choose "suppress" the "normal tags" and them you can the two sentences without tags.
I think that you need to make sure the default of the Option in the text search pop-up window be changed. Delete "<", ">"and "/" in it.
为您添加了声望,聊表谢意!Hug and kiss you!
Of course, the Demo version can do it well. Today, I downloaded the demo version of Paraconc Version 1.0 (Build 270). The two programmes at hand are almost the same, one folder is 1.31 M (Demo Version) and 1.33 M for Build 269, and the two executive files are very similar Build 269 (1350KB), while Build 270(1368KB). I think the displaying capacity may be different, but the rest are the same. Anyway, I haven't got time to try them in various aspects. I am also new to this tool.
Have seen your post about the ambition to study A Dream of Red Mansion. I do appreciate your determination. But alignment is always the most difficult and painstaking task. Although some programmes can do this job, hand-editing is a must.
In this website, there is a downloadable file about the manual for Paraconc, but I cannot find it. You can try and find it. If you cannot find the manual but need it, I can send it to the Gmail of corpus4u or to your email-address directly.