Microsoft Research Paraphrase Corpus

Haiyang Ai

Staff member
This document provides some information about the creation of the corpus, along with results of the annotation effort. If you use the corpus in your research, we would appreciate your citing one or both of the following papers, which give some details of our work on paraphrase and our data annotation efforts. (A paper describing in detail how this corpus was created is currently in progress.) We are continuing to tag data, and hope to release a larger version of this corpus to the research community in the future.

Quirk, C., C. Brockett, and W. B. Dolan. 2004. Monolingual Machine Translation for Paraphrase Generation, In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona Spain.

Dolan W. B., C. Quirk, and C. Brockett. 2004. Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. COLING 2004, Geneva, Switzerland.


By installing, copying, or otherwise using this Software, found at, you agree to be bound by the terms of this MSR-SSLA. If you do not agree, do not install copy or use the Software. The Software is protected by copyright and other intellectual property laws and is licensed, not sold