[语料发布] Tibetan Folk Tales Corpus

本文由 ArthurW2016-12-09 发表於 "语料汇集" 讨论区

  1. TIBETAN FOLK TALES CORPUS

    Source: http://www.sacred-texts.com/asia/tft
    Compiler: Jiayue Wang
    Time: 9 December 2016

    The texts were extracted from web pages downloaded from the website.
    Each line that begins with a hashtag (#) indicates the webpage and its
    relative path in the website.

    The corpus was created in a Linux environment, encoded in UTF-8, using
    Unix-style line ending (LF).

    Notes:
    1. A small part of the texts were extracted from "index" and other web
    pages which are not Buddhist texts but website comments etc.
    2. Although text extration was done in the order of filenames, e.g.
    ami01.txt > ami02.txt > ami03.txt, wrong orders may occasionally
    occur.
    3. Use of the corpus data is restricted to non-commercial purposes.
    4. The corpus can be freely re-distributed, provided the readme file
    is kept in the package.

    -------------------------------------
    Jiayue Wang arthur0421[AT]163.com
    College of Foreign Studies
    Guangxi University for Nationalities
    Nanning 530006
    China
    -------------------------------------