“Following Bernardini (2015:529), I extracted the same 13 POS patterns which she hypothesised to be common in English. All the word forms which fit the patterns above were automatically extracted from the EnOr and EnSI components of the SIREN corpus. Since the interpreted component is 17% bigger than the original, the search of EnSI yielded more hits than that of EnOr. To compensate for the disparity, the longer list of wordforms was randomly trimmed to the length of the shorter one. After that duplicates were removed from both lists, resulting in two lists of POS chain types characteristic, respectively, of interpreted and comparable non-interpreted English. "
以上内容为Collocations in non-interpreted and simultaneously interpreted English: a corpus study一文中对该方法的描述。