thanks a lot!
but until now the number(753169) of the tokens I have counted in the COLSEC is still much more than that mentioned in the book by 杨惠中，卫乃兴，(723299)
i have excluded the tag of <> and . and i don't know if there are any other symbols i shall exclude?
Unsurprising results as different programs may have different counting algorithms. They have different setting as to whether special characters are allowed in words. e.g. Are the following one or two words? - I'll, can't, gonna, so-called.