Corpus-based research on COMPLY

這是我我功課, 希望 大家可以 給我一些意見~~ 謝謝~ !!

A corpus-based dictionary entry of the verb COMPLY


Vocabulary items are important for language learners. In order to learn a vocabulary item in a better way, English learners not only have to learn its form (pronunciation and spelling), they also need to understand its grammar and collocation (Ur, 1991).

On the other hand, dictionaries are an important source of information for language learners. With the advancement in technology, online dictionaries are easily accessible by English learners. Consider the following example from the Longman Dictionary of Contemporary English Online (LDOCE), which is a corpus-based dictionary:

[FONT='Arial Unicode MS','sans-serif']com?ply[/font][FONT='Verdana','sans-serif'][/font]past tense and past participle [FONT='Verdana','sans-serif']complied[/font], present participle [FONT='Verdana','sans-serif']COMPLYing[/font], third person singular [FONT='Verdana','sans-serif']complies[/font][FONT='Verdana','sans-serif'] [intransitive][/font][FONT='Verdana','sans-serif'] formal[/font][FONT='Verdana','sans-serif'][/font]
[FONT='Verdana','sans-serif']to do what you have to do or are asked to do[/font][FONT='Arial Unicode MS','sans-serif'] [[/font][FONT='Arial Unicode MS','sans-serif']? [/font][FONT='Verdana','sans-serif']compliance, compliant][/font][FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif']COMPLY with[/font][FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif']Failure to COMPLY with the regulations will result in prosecution.[/font]
[FONT='Verdana','sans-serif']The newspaper was asked by federal agents for assistance and agreed to COMPLY.[/font]

(Longman Dictionary of Contemporary English Online, 2008)

In the above entry, the definition, phonetic transcriptions, the inflected forms, the verb type (intransitive) of COMPLY and two sample sentences are included. Nevertheless, while it tells learners that COMPLY often occurs with ‘with’, does the verb COMPLY has other frequent collocates? Besides, does the verb COMPLY show any tendency to occur with a particular semantic grouping of words (or semantic preference (Adolphs, 2006:56)). These, which are helpful for learners to better understand the meaning and usage of COMPLY, are not available in this dictionary entry.

Before computerized corpora were available for linguists, the study of collocation was heavily relied on intuition. While some examples of collocation can be detected intuitively such as time is consumed or computer programs run (Grreenbaum 1974:83, cited in McEnery, Xiao & Tono 2006:83), intuition is deemed by many linguists to be a poor guide to collocation. For example, Partington (1998:18) observes that there is no total agreement among native speakers on which collocations are acceptable and which are not. Krishnamurthy (2000: 32-33) also argues that since each of the native speakers only has a partial knowledge of the language, they have prejudices and preferences on collocation information. In light of this, Hunston (2002:68) points out that it is more reliable to measure collocation statistically, and therefore a corpus is essential. Stubbs (2001:53) shares a similar view and points out that a corpus can reveal this information across many speakers’ intuition and usage, to which individual speakers have no access. Moreover, determining frequency of co-occurrence is a daunting task. Therefore, corpus technique is essential for finding out the collocation information of COMPLY.

Therefore, this project aims to investigate collocation information COMPLY through corpus investigation in the British National Corpus (BNC) and find out whether COMPLY has other important frequent collocates. Furthermore, semantic preference of COMPLY, which can be found out based on collocation information (because semantic preference is highly collocational (McEnery, Xiao & Tono 2006:59)), will also be investigated. In the following, the motivation of the choice of word for this investigation (COMPLY) will be stated. Then the research question will be pointed out. The relevant methodology used to tackle the research question will also be elaborated in detail. The research findings will then be presented and an enriched version of the entry for ‘COMPLY’ will be presented, followed by a discussion on the results and the potential limitations of the research.

Motivation of the choice of the word COMPLY

According to the examination report of the Hong Kong Advanced Level Examination—Use of English (2005:page), which provides feedback on exam performance so that English teachers and students can pay special attention to the common errors made by Hong Kong students, it is pointed out [ENG1]that candidates failed to show an understanding of the meaning and usage of the verb COMPLY, as reflected in a multiple-choice question [ENG2]in which 63% failed to get the correct answer. This suggests that the word COMPLY is particularly difficult for Hong Kong students. Therefore, this verb is chosen for investigation.
回复: Corpus-based research on COMPLY

Research questions

In order to provide more collocation information of COMPLY to learners, the following research questions should be addressed:
(the habitual co-occurrence of patterns of words

1) What are the frequent collocates of COMPLY? (i.e. patterns of words with habitual co-occurrence (McEnery, Xiao & Tono 2006:81))
2) What is the semantic preference of COMPLY? (i.e. the semantic grouping with habitual co-occurrence (Adolphs, 2006: 56))

These parameters for describing the nature of COMPLY are chosen because they are very helpful for learners to fully understand the usage and meaning COMPLY (Ur, 1991), and such parameters are often not open to intuition alone.


The British National Corpus is chosen for this study. The BNC is a well-known general corpus which comprises 100,106,008 words, organized in 4124 written texts and transcripts of speech in modern British English (McEnery, Xiao & Tono 2006:59). This corpus is chosen for three reasons. First, it is a corpus of modern British English, which is one of the target varieties for most English learners in Hong Kong. Second, the BNC is designed to be representative of the language. The written section (90%) includes a wide range of genres such as regional and national newspapers, specialist periodicals and journals for all ages and interests, academic books and popular fiction, published and unpublished letters and memoranda, as well as school and university essays language (McEnery, Xiao & Tono 2006:59). The spoken section (10%) includes 863 transcripts of informal conversation selected from respondents of different ages, from various regions and from all social classes in a balanced way, together with spoken language collected in all kinds of different contexts ranging from formal business meetings to radio shows (McEnery, Xiao & Tono 2006:60). All this makes the BNC a balanced corpus for its purpose to represent modern British English in general. Third, as I do not have access to the corpus originally used by Longman dictionary-makers (the Longman Corpus Network), the BNC is used instead. Since both the BNC and the Longman corpus are large, balanced and represent the same type of English in roughly the same time frame (McEnery, Xiao & Tono 2006: 225), they should provide similar data on collocation.

Various methods are employed in this project. First, the word COMPLY and its inflected forms (i.e. complies, complied, complying) are looked for in the BNC. The collocation patterning of COMPLY will then be established. There are various ways offered by the query software in the BNC to achieve this, such as mutual information (MI), Z-score, log-log, log-likelihood and MI3. However, for pedagogical purposes collocates which are rare (with very low frequency yet high collocational strength) are not of little interest, since commonly used collocates are more important for language learners. Therefore, MI3 will be used here as this method does not put much emphasis on rare words (McEnery; Xiao; Tono, 2006: 217). Collocates which show strong collocational strength by this method will be chosen and they will be included in the dictionary entry. After that, one contextualized sentence will be chosen for each of the collocates and included in the dictionary entry as well.

On top of this, the semantic preference of COMPLY will be established based on collocation information. The claim will be further supported through a qualitative analysis on the concordance output. The concordance output is thinned into 200 lines randomly because of the high number of instances for COMPLY (2078 instances in total). More instances (50 instances at a time) are then investigated in the same way until no more new patterns can be found.


Collocations of COMPLY

In the collocation database of COMPLY (see appendix 2 for the top 50 collocates ranked by MI3), the preposition with ranks the first and has very high frequency as collocate (1747). In addition, fail and its various forms occur frequently as collocates too, with a total of 255 instances (failure—111, failing—34, failed—54, fails—30 and fail—26). Also, the collocates not which ranks sixth is mostly used as a negation marker. This shows that COMPLY often occurs in negative clauses with meanings similar to ‘fail/failure to comply. Lastly, the deontic auxiliary must (ranked in 13th place) and its close equivalents has to and have to (rank 30 and 46 respectively) have a strong collocation strength with COMPLY as well.

On the other hand, the which ranks the fourth. A closer look into its position of occurrence reveals that it mostly occurs in 2 items to the right of the node (COMPLY) (N+2) (see appendix 3). It is mostly used as a definite determiner for the nouns that follow. A qualitative analysis on those concordance lines reveals that the nouns that follow are either definite or mentioned previously, such as:

Under the Caravan Sites Act of 1968 all local authorities are required to find sites for gypsies "residing in or resorting to" their own area. But many authorities have either not been able to find suitable sites or have not complied with the ruling. (A30 71)

in which the two underlined phrases are co-referential. This information is helpful for improving the definition part of COMPLY.

However, despite the fact that to ranks the second in the list, a closer look into the concordance lines will reveal that to is used as an infinitive marker in most cases instead of a preposition which goes with the intransitive verb COMPLY. This is a typical feature for verbs and thus of no significance here.

Semantic preference of COMPLY

In the collocation database of COMPLY, it is revealed that noun collocates related to the law regulations, instructions, standards and requirements rank high in the list. A qualitative analysis in the thinned 200 concordance lines supports this observation. From the randomly thinned 200 concordance lines (See appendix 1 for all the 200 concordance output), it is not difficult to notice from the spans of the concordance lines that the verb COMPLY shows a strong tendency to take nouns referring to law, regulations, instructions, standards and requirements as direct object (or these nouns are often the subjects of the clauses) (118 out of 200 instances—59%), as illustrated in the examples below.

回复: Corpus-based research on COMPLY

[FONT='新細明體','serif']C11 and packs an all-new 3.5-litre flat 12 to [/font]
[FONT='新細明體','serif'] with new regulations governing the renamed World Sportscar[/font]
[FONT='新細明體','serif']have failed in the course of the inquiry to [/font]
[FONT='新細明體','serif'] with the requirements of natural justice.It may [/font]
[FONT='新細明體','serif']to determine what further action was needed to [/font]
[FONT='新細明體','serif'] with the regulations.[/font]
[FONT='新細明體','serif']floors are regularly shaped, for example, and [/font]
[FONT='新細明體','serif'] with fire regulations and travel distances could dictate [/font]
[FONT='新細明體','serif']are unable to answer multiple queries that do not [/font]
[FONT='新細明體','serif'] with these instructions.[/font]
[FONT='新細明體','serif']from the Essex Training and Enterprise Council to [/font]
[FONT='新細明體','serif'] with new laws.[/font]
[FONT='新細明體','serif']are unable to answer multiple queries that do not [/font]
[FONT='新細明體','serif'] with these instructions.[/font]
[FONT='新細明體','serif']?84m and exceptional gains totalling ?178m to [/font]
回复: Corpus-based research on COMPLY

[FONT='新細明體','serif'] with new accounting standards -- there will be exceptional [/font]
[FONT='新細明體','serif']industries.To stabilise its currency and [/font]
[FONT='新細明體','serif'] with EC law will require big sacrifices (possibly 10,000 [/font]
[FONT='新細明體','serif']relevance to deciding whether a decision-maker ought to [/font]
[FONT='新細明體','serif'] with the rules of natural justice or to the availability [/font]
[FONT='新細明體','serif']office dealing with a captain whose cargo did not [/font]
[FONT='新細明體','serif'] with his list of instructions.[/font]
[FONT='新細明體','serif']the administration, both direct and indirect, of [/font]
[FONT='新細明體','serif'] with those procedural requirements.[/font][FONT='新細明體','serif'] [/font][FONT='新細明體','serif']An example of [/font]
[FONT='新細明體','serif']Environmental Management] will be considered to [/font]
回复: Corpus-based research on COMPLY

[FONT='新細明體','serif'] with the regulation, subject to specific requirements.[/font]
[FONT='新細明體','serif']have found more than half the sample failing to [/font]
[FONT='新細明體','serif'] with statutory requirements.The DTI visits examine [/font]
[FONT='新細明體','serif']books of a trust and in respect of his failure to [/font]
[FONT='新細明體','serif'] with the provisions of Rule 4 (1) (a) [/font]
[FONT='新細明體','serif']in which the work is undertaken), have to [/font]
[FONT='新細明體','serif'] with the regulations from January 1993 onwards, and all [/font]
[FONT='新細明體','serif']can be fined up to ?2,000 if they fail to [/font]
[FONT='新細明體','serif'] with this legislation."[/font]
[FONT='新細明體','serif'] [/font]
[FONT='新細明體','serif'] [/font]
[FONT='新細明體','serif'] [/font]
[FONT='新細明體','serif'] [/font]

回复: Corpus-based research on COMPLY

Yet it should be noted that there might be more instances for this which cannot be figured out with KWIC (Keyword in context) view when using the Zurich… and can only be revealed by looking at the whole context, such as concordance line 25:

? Under the Regulations, workstations introduced on or after 1 January 1993 will have to conform to the Regulations immediately, but existing workstations or those installed on or before 31 December 1992 will have until the end of 1996 to COMPLY(CBT 3102)

Therefore, the actual number of instances in which COMPLY include law, regulations, instructions, standards and requirements will be higher than 59%.

回复: Corpus-based research on COMPLY

Discussion of the results

The findings on semantic preference generally support the definition offered in LDOCE that COMPLY is often used with nouns concerning what someone has to (obligation) or is asked to (request) do. In addition, the high ranking of the collocate with should be the motivation for the makers of LDOCE to include it in the entry. However, this entry can be further improved by adding other frequent collocates with high MI3 scores to the entry as well. The collocates that should be included are fail (and its inflected forms and failure), not, requirement, regulations, rules, provision, must and request, with illustrative examples provided so that learners can understand the context in which COMPLY occurs with these collocates.

The proposed improved version of the dictionary entry for COMPLY should look like this:

[FONT='Arial Unicode MS','sans-serif']com?ply[/font][FONT='Verdana','sans-serif'][/font]past tense and past participle [FONT='Verdana','sans-serif']complied[/font], present participle [FONT='Verdana','sans-serif']COMPLYing[/font], third person singular [FONT='Verdana','sans-serif']complies[/font][FONT='Verdana','sans-serif'] [intransitive][/font][FONT='Verdana','sans-serif'] formal[/font][FONT='Verdana','sans-serif'][/font]

[FONT='Verdana','sans-serif']to do what you have to do or are asked to do[/font][FONT='Verdana','sans-serif'], which is often definite[/font][FONT='Arial Unicode MS','sans-serif'] [[/font][FONT='Arial Unicode MS','sans-serif']? [/font][FONT='Verdana','sans-serif']compliance, compliant][/font][FONT='Verdana','sans-serif'] [/font][FONT='Verdana','sans-serif'][/font]
[FONT='Verdana','sans-serif']Common collocates[/font]
[FONT='Verdana','sans-serif']with[/font][FONT='Verdana','sans-serif'] [/font]
? [FONT='Verdana','sans-serif']The newspaper was asked by federal agents for assistance and agreed to COMPLY.[/font]

? [FONT='Verdana','sans-serif']Failure to COMPLY with the regulations can result in a ?2,000 fine or six month prison sentence. [/font][FONT='Verdana','sans-serif'](AJD 272)[/font]
? [FONT='Verdana','sans-serif']If the employer failed to COMPLY with an order to reinstate, the maximum special award is the greater of ?20,100 or 156 weeks' pay without there being any limit on a week's pay. [/font][FONT='Verdana','sans-serif'](J6N 1515)[/font]
? [FONT='Verdana','sans-serif']It is vital to COMPLY with legal requirements before embalming [/font][FONT='Verdana','sans-serif'](HU1 960)[/font]
? [FONT='Verdana','sans-serif']It is possible to build a new flue, either inside or outside, but there will be certain regulations to COMPLY with. [/font][FONT='Verdana','sans-serif'](CG5 813)[/font]
? [FONT='Verdana','sans-serif']Any entries received after the closing date will be disqualified, as well as any received mutilated, illegible, altered, incomplete or not COMPLYing with the rules. [/font][FONT='Verdana','sans-serif'](G2F 1338)[/font]
? [FONT='Verdana','sans-serif']It will be necessary to COMPLY with these provisions where the business is being sold, rather than where the company holding the tenancy is being sold [/font][FONT='Verdana','sans-serif'](J6N367)[/font]
[FONT='Verdana','sans-serif']Must/has to[/font][FONT='Verdana','sans-serif'][/font]
? [FONT='Verdana','sans-serif']Employers must COMPLY with provisions covering a wide variety of other matters, such as health and safety, maternity, race and sex discrimination and trade union activities [/font][FONT='Verdana','sans-serif'](HAJ 131)[/font]
[FONT='Verdana','sans-serif'](The filenames are just for reference and verification and will not be included in the entry)[/font]

回复: Corpus-based research on COMPLY

This entry is much enriched with collocational information and one does not have to worry about the problem of limited space in an online dictionary.

However, this entry is not without problem. First, while data from corpora are claimed to be authentic, Widdowson (2000) argues that corpus data are authentic only in a very limited sense. He argues that the notion of authenticity is relevant to the relationship between a particular piece of discourse and the response that it triggers in its immediate audience. That means when texts are taken out of their contexts, stored in a corpus and used as sample sentences in a dictionary entry, they become decontextualized and their authenticity of purpose is destroyed by their use with an unintended audience—language learners in this case. Nonetheless, corpus data are at least more authentic than invented sentences.

Besides, the examples extracted directly from the concordance output may be too difficult for learners of English as there are many difficult vocabulary items involved. Learners may be distracted from the learning goals by the difficult vocabulary items and context (Adolphs, 2006: 107). Therefore, sample sentences in some corpus-based dictionaries such as LDOCE are not entirely taken from corpora, and these dictionaries do not strictly follow the policy of ‘authentic examples only’ and use rewritten examples from corpora whenever they view it as necessary (McEnery, Xiao & Tono 2006: 209). This can control the level of vocabulary and grammar knowledge that is required to understand the dictionary entry. However, it can be argued that learners should be given the opportunity to engage with the type of language that they are likely to encounter when they come across the word COMPLY (Adolphs, 2006: 108). Therefore, naturally occurring data should be used in the entry. Furthermore, if the corpus data are modified too much, they are no better than invented examples in terms of authenticity and this is against one of the major objectives of using corpus data—to allow learners to expose to naturally occurring language in use (Adolphs, 2006: 108).

Thirdly, the underlying assumption that “more frequent collocates will be more useful for learners” may not be true. While some corpus linguists like Goethals believe that frequency is “a measure of probability of usefulness” (2003: 424), Kennedy (1998: 290) argues that the most frequent is not necessarily the most important and frequent should only be one of the criteria used for the choice. There might be some collocates which are very expressive but infrequent. Sadly, corpus techniques can only provide information of collocates which are of high frequency.


To conclude, it is shown that the information provided in the dictionary entry of COMPLY in LDOCE is not sufficient for learners to acquire the meaning and usage of the verb COMPLY. Collocation information, which is deemed to be important information for a vocabulary item, is therefore provided with the use of corpus techniques. The dictionary entry of COMPLY is then enriched by information on frequent collocates and semantic preference. Controversial issues such as authenticity of corpus data, amendment of corpus data for suiting learners’ needs and the notion of ‘more frequent means more useful’ are discussed.


? Adolphs, S. (2006). Introducing Electronic Text Analysis: A Practical Guide for Language and Literary Studies. London: Routledge
? Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.
? McEnery, T.; Xiao, R.; & Tono, Y. (2006) Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge
? Goethals, M. (2003). EET: the European English Teaching vocabulary list. Practical Applications in Language and Computers. Frankfurt: Peter Lang.
? Kennedy, G. (1998). An Introduction to Corpus Linguistics. London: Longman
? Krishnamurthy, R. (2000) ‘Collocation: from silly ass to lecixal sets’ in C. Heffer, H. Sauntson and G. Fox Words in Context: A Tribute to John Sinclair on his Retirement. Birmingham: University of Birmingham
? Partington, A. (1998). Patterns and Meanings. Amsterdam: John Benjamins
? Stubbs, M. (2001). On inference theories and code theories: corpus evidence for semantic schemas.
? Ur, P. (1991). A Course in Language Teaching: Practice and Theory. Cambridge: Cambridge University Press
? Widdowsonm H. (2000). The limitation of linguistics applied. Applied Linguistics 21/1