Skip to content

Pytorch BERT implementation for additional experiment

Notifications You must be signed in to change notification settings

jhnlee/pytorch-bert-korean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e08311f · Jan 10, 2020

History

30 Commits
Dec 30, 2019
Jan 10, 2020
Dec 30, 2019
Dec 30, 2019
Dec 30, 2019
Dec 30, 2019
Dec 30, 2019
Dec 30, 2019
Dec 16, 2019
Dec 30, 2019
Dec 30, 2019
Dec 26, 2019
Dec 7, 2019
Dec 6, 2019
Dec 30, 2019
Dec 30, 2019
Dec 30, 2019
Dec 16, 2019
Dec 6, 2019

Repository files navigation

Pytorch BERT Pretrain / Finetuning

pytorch BERT Trainer using HuggingFace transformers

Requirements

  • python 3.6
  • pytorch 1.12
  • cuda 10.0
  • tensorflow 1.14 (for tensorboard)
  • pytorch_transformers
  • gluonnlp >= 0.6.0
  • apex (for mixed precision training)
  • flask (for using api)

Pretrained Korean Bert Model (ETRI or SKT)
Make directory pretrained_model and make sub directory like below

pretrained_model
├── etri
│   ├── bert_config.json
│   ├── pytorch_model.bin
│   ├── tokenization.py
│   └── vocab.korean.rawtext.list
└── skt
    ├── bert_config.json
    ├── pytorch_model.bin
    ├── tokenizer.model
    └── vocab.json

Datasets

Datasets should be in csv format which has two columns named 'Sentence' and 'Emotion'.
Or you can modify a few codes below in datasets.py to fit your own datasets

...
# line 50 - 58
def get_data(self, file_path):
    data = pd.read_csv(file_path)
    corpus = data['Sentence']
    label = None
    try:
        label = [self.label2idx[l] for l in data['Emotion']]
    except:
        pass
    return corpus, label
...

Usage

For maksed language model pretrain

$ python train_mlm.py\
        --pretrained_type="etri"

For text classification

$ python train_classification.py\
        --pretrained_type="etri"

Classification after further MLM pretrain

$ python train_classification.py\
        --pretrained_model_path=".../best_model.bin"

Use fp16 argument for mixed precision training

$ python train_classification.py\
        --fp16\
        --fp16_opt_level="O1"

Inference

$ python test.py\
    --pretrained_model_path="./data/korean_single_test.csv" 

After inference, result file saved to /result folder.

  • /result/test_result.csv : predicted label for test data
  • /result/test_result.png : confusion matrix for test data

Result

Overall

Test Set(3,859)
Accuracy 57.69%
Macro F1 56.84%

F1 score for each Emotion

Emotion F1
공포 60.00%
놀람 57.49%
분노 54.60%
슬픔 62.64%
중립 44.21%
행복 81.88%
혐오 37.04%

Confusion matrix

Simple Web Application with Flask

$ python app.py
Sad case Happy case

About

Pytorch BERT implementation for additional experiment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published