-
Notifications
You must be signed in to change notification settings - Fork 273
newmm tokenization
Wannaphong Phatthiyaphaibun edited this page Dec 14, 2020
·
3 revisions
newmm
is a code name for The next maximal matching engine on PyThaiNLP. (It's not real name of word tokenizer engine.) It is a default of pythainlp.word_tokenize
. Now, newmm
is onecut
engine.
- multi_cut (PyThaiNLP 1.4 - 1.5): Thai word segmentation with maximum matching. The original source code is from Korakot Chaovavanich. Now, It's
mm
engine in PyThaiNLP. - onecut (PyThaiNLP 1.6 - Now): Dictionary-based maximal matching word segmentation, constrained with Thai Character Cluster (TCC) boundaries. created by Korakot Chaovavanich
PyThaiNLP