Corpus-based Language Studies: An advanced resource book
Publication details
Authors: Tony McEnery, Richard Xiao and Yukio Tono
Publisher: Routledge, London/New York
Series: Routledge Applied Lingusitics (RAL) Series
Date available: The end of 2005
Unique features of this book
The corpus-based approach to linguistic analysis and language teaching has come to prominence over the past two decades. This book seeks to bring readers up to date with the latest developments in corpus-based language studies. In comparison with the existing introductory books in corpus linguistics, Corpus-based Language Studies is unique in a number of ways.
First, this is a book which covers the ‘how to’ as well as the ‘why’. In approaching ‘how to’, we obviously have to focus on specific concordance packages and corpora which are currently available. However, our aim is to embrace a range of corpora and packages, hence hopefully offsetting any problems due to corpora being withdrawn or software radically changed. It is the ‘how to’ focus which in large part makes this book stand out from other available volumes. This book includes six case studies, each exploring a particular research question using specific tools. This is where the reader learns how to do corpus linguistics, as the process of investigating the data using the package(s) concerned is spelt out step by step, using text and screenshots. Thus by the end of each case study, a corpus has been introduced, the reader has learnt how to use a retrieval package and some research questions have been explored. Readers are then encouraged to explore a related research question using the same corpus data, tools and techniques. As well as explaining ‘how to’, the book also addresses ‘why’. While we may expect the reader to consult other books on corpus linguistics, we want this book, for two distinct reasons, to explore what one may do with corpus data and why one should want to do it. Firstly, and obviously, if we want the reader to be able to ‘become’ a corpus linguist having read the book, we clearly have to explain the rationale for corpus-based studies, and to use case studies both to exemplify the worth of corpus linguistics as well as the features of the packages concerned. Secondly, we want this book to tie in much more closely with linguistic theory than previous books in corpus linguistics have done. Our goal is to engage research questions and theory with corpus linguistics with an increasing depth and intensity as the book progresses.
Second, this is a book which engages with a range of approaches to the use of corpus data, which makes it different from existing books in corpus linguistics, with each case study focusing on a major approach to the use of corpus data while paying little or no attention to other approaches. After reading this volume, readers are expected to understand when and how to combine these approaches with other methodologies.
Finally, this is a book which is more focused on multilingual corpus linguistics than available corpus books. While this volume is concerned mainly with English corpus linguistics, we also cover issues in multilingual corpus linguistics, and have one case study focusing on a language other than English.
An overview of the book
SECTION A: INTRODUCTION
The corpus-based approach to linguistic analysis and language teaching has come to prominence over the past two decades. This book seeks to bring readers up to date with the latest developments in corpus-based language studies. The book is intended as an advanced resource book. This means that, by reading this book, readers will not only become familiar with the basic approach of corpus linguistics, they will also learn how to do corpus linguistics through a series of case studies.
Section A of this book sets the scene for corpus-based language studies by focusing on the theoretical aspects of corpus linguistics and introducing key concepts in the field. This section is broken into ten units, each focusing on either a key concept in corpus linguistics or on a practical issue that may face the corpus builder or user.
Unit 1 introduces corpus linguistics and answers questions such as ‘What is a corpus?’ and ‘Why is a corpus-based approach important?’. Unit 2 is concerned with such issues as representativeness, balance and sampling, while units 3 and 4 discuss corpus markup and annotation respectively. In unit 5 we introduce the multilingual dimension of corpus linguistics. Unit 6 seeks to raise readers’ statistical awareness, an awareness which is essential in corpus-based language studies. Unit 7 introduces publicly available, well-known and influential corpora while unit 8 considers the important decisions and practical issues one may face when constructing a corpus. Unit 9 deals with copyright issues in corpus building. Finally, unit 10 explores the use of corpora in language studies.
SECTION B EXTENSION
Section A introduced some important concepts in corpus linguistics. We also briefly considered the use of corpora in a range of areas of language studies. In this section, readers will get an opportunity to read excerpts from published material which will go into a number of research areas in more depth. The excerpts presented in this section have been selected carefully using a number of criteria. The primary criterion is the originality, importance and influence of the paper in the area of study. The second criterion is its current relevance. Given the second criterion, it is unsurprising that, with a few exceptions, the majority of the papers in this section were published in or after 1998. The final criterion is a pragmatic one C some papers, while interesting, simply did not fit well with the overall design of the book. We are fully aware that a book of this size cannot possibly include all of the publications which meet the above criteria. Also, the recentness of data included here can be viewed as an advantage or a disadvantage, depending upon one’s viewpoint. Those who view it as a disadvantage might argue that the book is wanting in historical background. Nevertheless, it can also be argued reasonably that the focus on current research is as important as historical depth. Readers interested in the historical dimension of corpus linguistics should look to Biber, Conrad and Reppen (1998), Kennedy (1998), and McEnery and Wilson (2001), which have already covered much of the history of corpus analysis. Furthermore, readers can refer to McCarthy and Sampson (2004) for an anthology of important publications on corpus linguistics including papers from its early years.
The excerpts selected using the above criteria are designed to help readers understand a number of key concepts in corpus linguistics and bring them up to date with the latest developments in corpus-based language studies. They are also selected to get readers familiarized with a particular area of study so that they will be ready to explore the case studies in Section C. Note that in order to save space in this book, the excerpts are presented without notes or references. Readers are advised to refer to the original publications for these. We would also like to remind readers that the terminology used in each excerpt may differ slightly from that adopted in this book. At no point, however, does this slight imprecision interfere with the general argument presented.
This section consists of two parts. Part 1 ‘Important and controversial issues’ (units 11 C 12) discusses further some important or controversial issues in corpus linguistics introduced in Section A, namely corpus representativeness and balance (unit 11), and the pros and cons of the corpus-based approach (unit 12). Part 2 ‘Corpus linguistics in action’ (units 13 C 16) presents corpus-based studies in some of the areas we considered in Section A including, for example, lexical and grammatical studies (unit 13), language variation (unit 14), contrastive and diachronic studies (unit 15), and finally language teaching and learning (unit 16).
SECTION C EXPLORATION
Having introduced the key concepts in corpus linguistics and presented excerpts from published material, we now want to engage readers in a series of case studies. These case studies investigate research questions in some of the areas of linguistic analysis introduced in Section A and further discussed in Section B. Each case study starts with an overview of the background knowledge needed for the study and a brief description of the corpus data used. Then it explores, together with the reader, a particular research question using specific tools (a corpus exploration tool and/or a statistics package). This is where the reader learns how to ‘do’ corpus linguistics, as the process of investigating the data using the package(s) concerned will be spelt out step by step, using text and screenshots. Thus by the end of each case study, a corpus has been introduced, the reader has learnt how to use a retrieval package and some research questions have been explored. Readers are then encouraged to explore a related research question using the same corpus data, tools and techniques. Readers can visit the authors’ companion website given in the Appendix for details of the availability of corpora and tools used in these case studies.
This section consists of six case studies. Case study 1 explores the area of pedagogical lexicography on the basis of the BNC corpus (Word Edition), using BNCWeb. The focus of this study is on collocation analysis and the study seeks to describe collocation patterns of sweet from the BNC and integrate that information into a description of a dictionary entry. Case study 2 uses four corpora of the Brown family to explore the potential factors that may influence a language user’s choice of a full or bare infinitive after HELP, which include language variety (British English vs. American English), language change (English in the early 1960s and the early 1990s) and a range of syntactic conditions (e.g. an intervening nominal phrase, a preceding infinitive marker and the passive). This case study also introduces MonoConc Pro and SPSS. Case study 3 uses WordSmith version 4 and the Japanese component of the Longman Learners’ Corpus to study the second language acquisition of English grammatical morphemes. Case study 4 uses the metadata encoded in the BNC (version 2) pertaining to demographic features such as user age, gender and social class, and textual features such as register, publication medium and domain to explore such dimensions of variation to discover a general pattern of swearing (more specifically the use of the word FUCK) in modern British English. This case study demonstrates how to use BNCWeb to make complex queries and provides readers with an opportunity to practice using SPSS. Case study 5 compares two approaches to genre analysis C Biber’s (1988) multi-feature/multi-dimensional analysis and Tribble’s (1999) use of the keyword function of WordSmith C through a comparison of speech and conversation in American English. This study introduces some advanced functions of WordSmith version 3. The final case study uses parallel and comparable corpora of English and Chinese to examine the effect of domain, text type and translation upon aspect marking in Chinese. This study also introduces parallel concordancing.
We would remind the readers that for each case study alternate versions of the study are available on our companion website covering most concordance packages. Note also that if any of the results gained by the readers do not match those given here they should check the website for an update.
Publication details
Authors: Tony McEnery, Richard Xiao and Yukio Tono
Publisher: Routledge, London/New York
Series: Routledge Applied Lingusitics (RAL) Series
Date available: The end of 2005
Unique features of this book
The corpus-based approach to linguistic analysis and language teaching has come to prominence over the past two decades. This book seeks to bring readers up to date with the latest developments in corpus-based language studies. In comparison with the existing introductory books in corpus linguistics, Corpus-based Language Studies is unique in a number of ways.
First, this is a book which covers the ‘how to’ as well as the ‘why’. In approaching ‘how to’, we obviously have to focus on specific concordance packages and corpora which are currently available. However, our aim is to embrace a range of corpora and packages, hence hopefully offsetting any problems due to corpora being withdrawn or software radically changed. It is the ‘how to’ focus which in large part makes this book stand out from other available volumes. This book includes six case studies, each exploring a particular research question using specific tools. This is where the reader learns how to do corpus linguistics, as the process of investigating the data using the package(s) concerned is spelt out step by step, using text and screenshots. Thus by the end of each case study, a corpus has been introduced, the reader has learnt how to use a retrieval package and some research questions have been explored. Readers are then encouraged to explore a related research question using the same corpus data, tools and techniques. As well as explaining ‘how to’, the book also addresses ‘why’. While we may expect the reader to consult other books on corpus linguistics, we want this book, for two distinct reasons, to explore what one may do with corpus data and why one should want to do it. Firstly, and obviously, if we want the reader to be able to ‘become’ a corpus linguist having read the book, we clearly have to explain the rationale for corpus-based studies, and to use case studies both to exemplify the worth of corpus linguistics as well as the features of the packages concerned. Secondly, we want this book to tie in much more closely with linguistic theory than previous books in corpus linguistics have done. Our goal is to engage research questions and theory with corpus linguistics with an increasing depth and intensity as the book progresses.
Second, this is a book which engages with a range of approaches to the use of corpus data, which makes it different from existing books in corpus linguistics, with each case study focusing on a major approach to the use of corpus data while paying little or no attention to other approaches. After reading this volume, readers are expected to understand when and how to combine these approaches with other methodologies.
Finally, this is a book which is more focused on multilingual corpus linguistics than available corpus books. While this volume is concerned mainly with English corpus linguistics, we also cover issues in multilingual corpus linguistics, and have one case study focusing on a language other than English.
An overview of the book
SECTION A: INTRODUCTION
The corpus-based approach to linguistic analysis and language teaching has come to prominence over the past two decades. This book seeks to bring readers up to date with the latest developments in corpus-based language studies. The book is intended as an advanced resource book. This means that, by reading this book, readers will not only become familiar with the basic approach of corpus linguistics, they will also learn how to do corpus linguistics through a series of case studies.
Section A of this book sets the scene for corpus-based language studies by focusing on the theoretical aspects of corpus linguistics and introducing key concepts in the field. This section is broken into ten units, each focusing on either a key concept in corpus linguistics or on a practical issue that may face the corpus builder or user.
Unit 1 introduces corpus linguistics and answers questions such as ‘What is a corpus?’ and ‘Why is a corpus-based approach important?’. Unit 2 is concerned with such issues as representativeness, balance and sampling, while units 3 and 4 discuss corpus markup and annotation respectively. In unit 5 we introduce the multilingual dimension of corpus linguistics. Unit 6 seeks to raise readers’ statistical awareness, an awareness which is essential in corpus-based language studies. Unit 7 introduces publicly available, well-known and influential corpora while unit 8 considers the important decisions and practical issues one may face when constructing a corpus. Unit 9 deals with copyright issues in corpus building. Finally, unit 10 explores the use of corpora in language studies.
SECTION B EXTENSION
Section A introduced some important concepts in corpus linguistics. We also briefly considered the use of corpora in a range of areas of language studies. In this section, readers will get an opportunity to read excerpts from published material which will go into a number of research areas in more depth. The excerpts presented in this section have been selected carefully using a number of criteria. The primary criterion is the originality, importance and influence of the paper in the area of study. The second criterion is its current relevance. Given the second criterion, it is unsurprising that, with a few exceptions, the majority of the papers in this section were published in or after 1998. The final criterion is a pragmatic one C some papers, while interesting, simply did not fit well with the overall design of the book. We are fully aware that a book of this size cannot possibly include all of the publications which meet the above criteria. Also, the recentness of data included here can be viewed as an advantage or a disadvantage, depending upon one’s viewpoint. Those who view it as a disadvantage might argue that the book is wanting in historical background. Nevertheless, it can also be argued reasonably that the focus on current research is as important as historical depth. Readers interested in the historical dimension of corpus linguistics should look to Biber, Conrad and Reppen (1998), Kennedy (1998), and McEnery and Wilson (2001), which have already covered much of the history of corpus analysis. Furthermore, readers can refer to McCarthy and Sampson (2004) for an anthology of important publications on corpus linguistics including papers from its early years.
The excerpts selected using the above criteria are designed to help readers understand a number of key concepts in corpus linguistics and bring them up to date with the latest developments in corpus-based language studies. They are also selected to get readers familiarized with a particular area of study so that they will be ready to explore the case studies in Section C. Note that in order to save space in this book, the excerpts are presented without notes or references. Readers are advised to refer to the original publications for these. We would also like to remind readers that the terminology used in each excerpt may differ slightly from that adopted in this book. At no point, however, does this slight imprecision interfere with the general argument presented.
This section consists of two parts. Part 1 ‘Important and controversial issues’ (units 11 C 12) discusses further some important or controversial issues in corpus linguistics introduced in Section A, namely corpus representativeness and balance (unit 11), and the pros and cons of the corpus-based approach (unit 12). Part 2 ‘Corpus linguistics in action’ (units 13 C 16) presents corpus-based studies in some of the areas we considered in Section A including, for example, lexical and grammatical studies (unit 13), language variation (unit 14), contrastive and diachronic studies (unit 15), and finally language teaching and learning (unit 16).
SECTION C EXPLORATION
Having introduced the key concepts in corpus linguistics and presented excerpts from published material, we now want to engage readers in a series of case studies. These case studies investigate research questions in some of the areas of linguistic analysis introduced in Section A and further discussed in Section B. Each case study starts with an overview of the background knowledge needed for the study and a brief description of the corpus data used. Then it explores, together with the reader, a particular research question using specific tools (a corpus exploration tool and/or a statistics package). This is where the reader learns how to ‘do’ corpus linguistics, as the process of investigating the data using the package(s) concerned will be spelt out step by step, using text and screenshots. Thus by the end of each case study, a corpus has been introduced, the reader has learnt how to use a retrieval package and some research questions have been explored. Readers are then encouraged to explore a related research question using the same corpus data, tools and techniques. Readers can visit the authors’ companion website given in the Appendix for details of the availability of corpora and tools used in these case studies.
This section consists of six case studies. Case study 1 explores the area of pedagogical lexicography on the basis of the BNC corpus (Word Edition), using BNCWeb. The focus of this study is on collocation analysis and the study seeks to describe collocation patterns of sweet from the BNC and integrate that information into a description of a dictionary entry. Case study 2 uses four corpora of the Brown family to explore the potential factors that may influence a language user’s choice of a full or bare infinitive after HELP, which include language variety (British English vs. American English), language change (English in the early 1960s and the early 1990s) and a range of syntactic conditions (e.g. an intervening nominal phrase, a preceding infinitive marker and the passive). This case study also introduces MonoConc Pro and SPSS. Case study 3 uses WordSmith version 4 and the Japanese component of the Longman Learners’ Corpus to study the second language acquisition of English grammatical morphemes. Case study 4 uses the metadata encoded in the BNC (version 2) pertaining to demographic features such as user age, gender and social class, and textual features such as register, publication medium and domain to explore such dimensions of variation to discover a general pattern of swearing (more specifically the use of the word FUCK) in modern British English. This case study demonstrates how to use BNCWeb to make complex queries and provides readers with an opportunity to practice using SPSS. Case study 5 compares two approaches to genre analysis C Biber’s (1988) multi-feature/multi-dimensional analysis and Tribble’s (1999) use of the keyword function of WordSmith C through a comparison of speech and conversation in American English. This study introduces some advanced functions of WordSmith version 3. The final case study uses parallel and comparable corpora of English and Chinese to examine the effect of domain, text type and translation upon aspect marking in Chinese. This study also introduces parallel concordancing.
We would remind the readers that for each case study alternate versions of the study are available on our companion website covering most concordance packages. Note also that if any of the results gained by the readers do not match those given here they should check the website for an update.
[本贴已被 作者 于 2005年11月22日 07时23分24秒 编辑过]