-
Notifications
You must be signed in to change notification settings - Fork 273
Model Cards
These model cards contain technical details of the models developed and used in PyThaiNLP.
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- Model date: 2020-10-03
- Model version: 0.2
- Used in PyThaiNLP version: 2.2.4 +
- Filename:
~/pythainlp-data/cls-v0.2.crfsuite
- GitHub: https://github.com/PyThaiNLP/pythainlp/pull/479
- CRF Model
- License: CC0
Intended Use
- Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data LST20 Corpus Train set (news domain)
Evaluation Data LST20 Corpus Test set (news domain)
Quantitative Analyses
precision recall f1-score support
B_CLS 0.90 0.94 0.92 16111
E_CLS 0.90 0.94 0.92 15947
I_CLS 0.99 0.97 0.98 169565
micro avg 0.97 0.97 0.97 201623
macro avg 0.93 0.95 0.94 201623
weighted avg 0.97 0.97 0.97 201623
samples avg 0.94 0.94 0.94 201623
Ethical Considerations no ideas
Caveats and Recommendations
- The user must perform word segmentation first before using this model.
- Thai text only
Model Details
- Developer: Chonlapat Patanajirasit
- Model date: 2020-05-09
- Model version: 1.0
- Used in PyThaiNLP version: 2.2 +
- Filename:
pythainlp/corpus/sentenceseg_crfcut.model
- GitHub: https://github.com/vistec-AI/crfcut
- CRF Model
- License: CC0
Intended Use
- Segmenting Thai text into sentences.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data ?
Evaluation Data ?
Quantitative Analyses ? Ethical Considerations no ideas
Caveats and Recommendations
- Thai text only
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- Model date: 2020-5-21
- Model version: 1.4
- Used in PyThaiNLP version: 2.2 +
- Filename:
~/pythainlp-data/thai-ner-1-4.crfsuite
- CRF Model
- License: CC0
- GitHub for Thai NER 1.4 (Data and train notebook): https://github.com/wannaphong/thai-ner/tree/master/model/1.4
Intended Use
- Named-Entity Tagging for Thai.
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data ThaiNER 1.3 Corpus Train set
Evaluation Data ThaiNER 1.3 Corpus Test set
Quantitative Analyses
precision recall f1-score support
precision recall f1-score support
B-DATE 0.92 0.86 0.89 375
I-DATE 0.94 0.94 0.94 747
B-EMAIL 1.00 1.00 1.00 5
I-EMAIL 1.00 1.00 1.00 28
B-LAW 0.71 0.56 0.62 43
I-LAW 0.74 0.70 0.72 154
B-LEN 0.96 0.93 0.95 29
I-LEN 0.98 0.94 0.96 69
B-LOCATION 0.88 0.77 0.82 864
I-LOCATION 0.86 0.73 0.79 852
B-MONEY 0.98 0.85 0.91 105
I-MONEY 0.96 0.95 0.95 239
B-ORGANIZATION 0.90 0.78 0.84 1166
I-ORGANIZATION 0.84 0.77 0.81 1338
B-PERCENT 1.00 0.97 0.99 34
I-PERCENT 1.00 0.96 0.98 51
B-PERSON 0.96 0.82 0.88 676
I-PERSON 0.94 0.92 0.93 2424
B-PHONE 1.00 0.72 0.84 29
I-PHONE 0.96 0.92 0.94 78
B-TIME 0.87 0.73 0.79 172
I-TIME 0.94 0.83 0.88 336
B-URL 0.89 1.00 0.94 24
I-URL 0.96 1.00 0.98 371
B-ZIP 1.00 1.00 1.00 4
micro avg 0.91 0.84 0.87 10213
macro avg 0.93 0.87 0.89 10213
weighted avg 0.91 0.84 0.87 10213
samples avg 0.17 0.17 0.17 10213
Ethical Considerations no ideas
Caveats and Recommendations
- Thai text only
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- Model date: 2018-5-15
- Model version: 1.0
- Used in PyThaiNLP version: 1.7 +
- Filename:
pythainlp/corpus/pos_orchid_perceptron.json
- perceptron model
- License: CC0
- train notebook: https://github.com/PyThaiNLP/pythainlp_notebook/blob/master/postag/train_orchid_postag_pythainlp.ipynb
Intended Use
- Part of speech for Thai.
- Not suitable for other language or other domain of orchid corpus.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data Orchid Corpus
Evaluation Data Orchid Corpus
Quantitative Analyses
No data
Ethical Considerations no ideas
Caveats and Recommendations
- Thai word token only
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- Model date: 2020-8-11
- Model version: 0.2.3
- Used in PyThaiNLP version: 2.2.5 +
- Filename:
pythainlp/corpus/pos_lst20_perceptron-v0.2.3.json
- perceptron model
- License: CC0
- train notebook: https://github.com/PyThaiNLP/pythainlp_notebook/blob/master/postag/train_lst20_pythainlp.ipynb
Intended Use
- Part of speech for Thai.
- Not suitable for other language or other domain of LST20 corpus.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data
LST20 Corpus Train set
Evaluation Data
LST20 Corpus Test set
Quantitative Analyses
precision recall f1-score support
AJ 0.90 0.87 0.88 4403
AV 0.88 0.79 0.83 6722
AX 0.95 0.94 0.95 7556
CC 0.94 0.97 0.95 17613
CL 0.87 0.85 0.86 3739
FX 0.99 0.99 0.99 6918
IJ 1.00 0.25 0.40 4
NG 1.00 1.00 1.00 1694
NN 0.97 0.98 0.98 58568
NU 0.98 0.98 0.98 6256
PA 0.88 0.89 0.88 194
PR 0.88 0.85 0.86 2139
PS 0.94 0.93 0.94 10886
PU 1.00 1.00 1.00 37973
VV 0.95 0.97 0.96 42586
XX 0.00 0.00 0.00 27
accuracy 0.96 207278
macro avg 0.88 0.83 0.84 207278
weighted avg 0.96 0.96 0.96 207278
Ethical Considerations no ideas
Caveats and Recommendations
- Thai word token only
PyThaiNLP