BERT finetuning (classification + language model) #637

Bycob · 2019-09-10T14:14:06Z

Add training with pytorch backend
Finetune and test BERT-based text classifiers
Finetune language models using word masking

<!> When tracing models, use pytorch 1.3.1. and latest transformers (formerly pytorch-transformers)

pip3 install torch==1.3.1 transformers

Added parameters

Text Input Connector

Parameter	Type	Optional	Default	Description
ordered_words	bool	yes	false	word-based processing with positionnal information
wordpiece_tokens	bool	yes	false	if vocabulary contains partial words. Words can be split into multiple tokens.
punctuation_tokens	bool	yes	false	Treat each punctuation sign as a token. (if false, punctuation is stripped from input)

Torch MLLib

Parameter	Type	Optional	Default	Description
self_supervised	string	yes	""	self-supervised mode: "mask" for masked language model [TODO Add options : "next" = Next token prediction for GPT2?]
embedding_size	int	yes	768	embedding size for NLP models
freeze_traced	bool	yes	false	Freeze the traced part of the net during finetuning (e.g. for classification)

Bycob · 2019-09-10T14:15:53Z

Bycob · 2019-09-10T14:27:27Z

Example: Finetune a classification model

Trace pytorch pretrained bert

pip3 install --user transformers
mkdir classif_training
./trace_pytorch_transformers.py bert --output-dir classif_training --vocab --verbose

Run dede
Start the service

curl -X PUT "http://localhost:8080/services/torch_bert_training" -d '{
    "description": "News20 classification service using BERT",
    "mllib": "torch",
    "model": {
        "repository": "./classif_training/"
    },
    "parameters": {
        "input": {
            "connector": "txt",
            "ordered_words": true,
            "wordpiece_tokens": true,
            "punctuation_tokens": true,
            "sequence": 512
        },
        "mllib": {
            "template":"bert",
            "nclasses": 20,
            "finetuning":true,
            "gpu": true
        }
    },
    "type": "supervised"
}
'

Train the model on news20 dataset

curl -X POST "http://localhost:8080/train" -d '{
    "service": "torch_bert_training", 
    "parameters": { 
         "mllib": {
            "solver": {
              "iterations":3000,
              "test_interval":250,
              "base_lr":1e-5,
              "iter_size":4,
              "snapshot":250,
              "solver_type":"ADAM"
            },
            "net": {
              "batch_size":8,
              "test_batch_size":4
            }
        },
        "input": {
            "shuffle":true
        },
        "output": {
            "measure":["f1", "mcll", "acc", "cmdiag", "cmfull"]
        }
    }, 
    "data": ["/opt/data/news20/train/", "/opt/data/news20/test/"]
}
'

Bycob · 2019-09-17T14:38:37Z

Example: Finetune language model

Trace pytorch pretrained bert

pip3 install --user transformers
mkdir lm_training
./trace_pytorch_transformers.py bert -vo lm_training --vocab

Run dede
Start the service

curl -X PUT "http://localhost:8080/services/torch_bert_lm" -d '{
    "description": "BERT language model finetuning on News20 ",
    "mllib": "torch",
    "model": {
        "repository": "./lm_training/"
    },
    "parameters": {
        "input": {
            "connector": "txt",
            "ordered_words": true,
            "wordpiece_tokens": true,
            "punctuation_tokens": true,
            "sequence": 512
        },
        "mllib": {
            "template":"bert",
            "self_supervised":"mask",
            "finetuning": true,
            "gpu": true
        }
    },
    "type": "supervised"
}
'

Train the model on news20 dataset

curl -X POST "http://localhost:8080/train" -d '{
    "service": "torch_bert_lm", 
    "parameters": { 
         "mllib": {
            "solver": {
              "iterations":3000,
              "test_interval":250,
              "base_lr":1e-5,
              "iter_size":8,
              "snapshot":250,
              "solver_type":"ADAM"
            },
            "net": {
              "batch_size":4,
              "test_batch_size":4
            }
        },
        "input": {
            "shuffle":true,
            "test_split":0.03
        },
        "output": {
            "measure":["acc", "acc-5"]
        }
    }, 
    "data": ["/opt/data/news20/train/"]
}
'

The fix on txtinputconnector is temporary, vocab generation should be fixed a more robust way

…for classif

beniz · 2019-09-25T04:12:24Z

Is the vocabulary matched against the wildcard tokens of BERT (the '##' tokens) ? From line

deepdetect/src/backends/torch/torchinputconns.cc

Line 198 in 09d8ca1

if ((it = _vocab.find(word)) != _vocab.end())

it seems that the vocab is matched only exactly, is my understanding correct ?

beniz · 2019-09-25T04:17:51Z

* Add vocabulary generation with BPE if finetuning = false

What would be needed to train a model from scratch with a new vocabulary ?

beniz · 2019-09-25T04:28:50Z

"save_period":250,

Please replace with snapshot_interval that is the default name in DD API.

beniz · 2019-09-25T04:35:09Z

"width": 512

The txt input connector already defines sequence for the max character sequence size, maybe we'd like to use the same here.

beniz · 2019-09-25T04:50:28Z

Example: Finetune a classification model

This fails for me, with error:

{"status":{"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"\narguments for call are not valid:\n  \n  for operator aten::mean(Tensor self) -> Tensor:\n  expected at most 1 arguments but found 3 positional arguments.\n

and server log:

[2019-09-25 06:50:06.470] [api] [info] Running DeepDetect HTTP server on 10.10.77.61:8501
[2019-09-25 06:50:09.859] [torch_bert_training] [info] loaded vocabulary of size=30522
[2019-09-25 06:50:11.417] [torch_bert_training] [info] Loading ml model from file /data1/beniz/torch_models/news20/bert-pretrained.pt.
[2019-09-25 06:50:12.942] [torch_bert_training] [error] service creation call failed
[2019-09-25 06:50:12.944] [api] [error] 10.10.77.61 "PUT /services/torch_bert_training" 500 3120

I believe this is due to the attention head of the traced model.

@BynaryCobweb can you make sure that ./trace_pytorch_transformers.py bert --output-dir classif_training --vocab --verbose is the right command ? If I copy the bert-pretrained.pt from one of your trained news20 models, it seems to work. Your model has different size (510MB) than the one I obtain by tracing with your command (514MB) and the attention head input size differs.

Then I get this error:

[2019-09-25 07:24:03.593] [api] [error] {"code":500,"msg":"InternalError","dd_code":500,"dd_msg":"Libtorch error:isTuple() ASSERT FAILED at /home/beniz/projects/deepdetect/dev/deepdetect/build_bert/pytorch/src/pytorch/torch/include/ATen/core/ivalue.h:246, please report a bug to PyTorch. (toTuple at /home/beniz/projects/deepdetect/dev/deepdetect/build_bert/pytorch/src/pytorch/torch/include/ATen/core/ivalue.h:246)\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x66 (0x7f2d32266cd6 in /home/beniz/projects/deepdetect/dev/deepdetect/build_bert/pytorch/src/pytorch/torch/lib/libc10.so)\nframe #1: c10::IValue::toTuple() const & + 0x222 (0x76f122 in ./dede)\nframe #2: dd::TorchModule::forward(std::vector<c10::IValue, std::allocatorc10::IValue >) + 0xa1d (0x75977d in ./dede)\nframe #3: dd::TorchLib<dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>::train(dd::APIData const&, dd::APIData&) + 0xd40 (0x776790 in ./dede)\nframe #4: std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, std::_Bind_simple<dd::MLService<dd::TorchLib, dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>::train_job(dd::APIData const&, dd::APIData&)::{lambda()#1} ()>, int> >::_M_invoke(std::_Any_data const&) + 0xbd (0x69fbad in ./dede)\nframe #5: std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) + 0x29 (0x686719 in ./dede)\nframe #6: + 0xea99 (0x7f2d39b4ca99 in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #7: std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) + 0x95 (0x6878b5 in ./dede)\nframe #8: std::thread::_Impl<std::_Bind_simple<std::__future_base::_Async_state_impl<std::_Bind_simple<dd::MLService<dd::TorchLib, dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>::train_job(dd::APIData const&, dd::APIData&)::{lambda()#1} ()>, int>::_Async_state_impl(dd::MLService<dd::TorchLib, dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>::train_job(dd::APIData const&, dd::APIData&)::{lambda()#1} (&&)())::{lambda()#1} ()> >::_M_run() + 0x5a (0x68abaa in ./dede)\nframe #9: + 0xb8c80 (0x7f2d2de1cc80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)\nframe #10: + 0x76ba (0x7f2d39b456ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #11: clone + 0x6d (0x7f2d2d36041d in /lib/x86_64-linux-gnu/libc.so.6)\n"}

Bycob · 2019-09-25T08:20:27Z

Is the vocabulary matched against the wildcard tokens of BERT (the '##' tokens) ? From line

deepdetect/src/backends/torch/torchinputconns.cc

Line 198 in 09d8ca1

if ((it = _vocab.find(word)) != _vocab.end())

it seems that the vocab is matched only exactly, is my understanding correct ?

Tokenization (including wordpiece) is done in src/txtinputfileconn.cc:349 (see also: WordpieceTokenizer::append_input()). Here we just convert the previously found tokens to ids.

Bycob · 2019-09-25T08:21:39Z

* Add vocabulary generation with BPE if finetuning = false
What would be needed to train a model from scratch with a new vocabulary ?

The model has to be traced without initial weights:

./trace_pytorch_transformers.py bert --vocab --verbose --not-pretrained

When using a traced model, the size of the vocabulary is fixed (as well as the sequence length), so I may add an option to the script to chose the size of the vocabulary.

Then the vocabulary should be generated, the best option is to use Byte Pair Encoding I think (RoBERTa uses BPE for its vocabulary). A BPE tokenizer may be required.

Beside of this, the training process should be similar to finetuning, with different hyperparameters.

Bycob force-pushed the bert_training branch from 4d2767d to a92c4a6 Compare September 17, 2019 11:57

beniz requested a review from fantes September 18, 2019 14:17

Bycob added 9 commits September 23, 2019 14:56

Move dataset management and model building in separate classes

121831e

Add train and test

02ecf82

The fix on txtinputconnector is temporary, vocab generation should be fixed a more robust way

BERT finetuning with custom number of classes

150781b

Add self supervised Masked LM learning

dbc611c

Save solver checkpoint along with model

8cc4914

Ensure label is of correct dimension

a5a0491

Fix masked_lm, add more explicit error message

00dc152

Add script to trace huggingface models

aca9614

Add classfication on hidden states to be able to use masked lm model …

ec3438f

…for classif

Bycob force-pushed the bert_training branch from 6ff3a42 to 0f233a8 Compare September 23, 2019 12:59

Bycob force-pushed the bert_training branch 3 times, most recently from ba54cba to 7061257 Compare September 25, 2019 16:00

beniz changed the base branch from master to bert_training September 26, 2019 15:09

Bycob added 4 commits September 26, 2019 17:53

Better API, more features, less memory usage and fix bugs

fabe45b

Add unit tests for training

57875c7

Move training parameters to solver and net

c727ce0

Add comments

48f74ad

Bycob force-pushed the bert_training branch from 7920fb7 to 48f74ad Compare September 26, 2019 15:59

Download tar from deepdetect.com

6ad2dd9

beniz merged commit 1da729f into jolibrain:bert_training Sep 27, 2019

This was referenced Dec 10, 2019

Bert training integration #672

Closed

bert + gpt2 inference + training wrt torch 1.3.1 and transformers 2.2.1 #673

Merged

beniz added the mllib:pytorch label Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BERT finetuning (classification + language model) #637

BERT finetuning (classification + language model) #637

Bycob commented Sep 10, 2019 •

edited by fantes

Loading

Bycob commented Sep 10, 2019 •

edited

Loading

Bycob commented Sep 10, 2019 •

edited by fantes

Loading

Bycob commented Sep 17, 2019 •

edited by fantes

Loading

beniz commented Sep 25, 2019 •

edited

Loading

beniz commented Sep 25, 2019

beniz commented Sep 25, 2019 •

edited

Loading

beniz commented Sep 25, 2019

beniz commented Sep 25, 2019 •

edited

Loading

Bycob commented Sep 25, 2019

Bycob commented Sep 25, 2019 •

edited

Loading

BERT finetuning (classification + language model) #637

BERT finetuning (classification + language model) #637

Conversation

Bycob commented Sep 10, 2019 • edited by fantes Loading

Added parameters

Bycob commented Sep 10, 2019 • edited Loading

TODO before merge

Then

Bycob commented Sep 10, 2019 • edited by fantes Loading

Example: Finetune a classification model

Bycob commented Sep 17, 2019 • edited by fantes Loading

Example: Finetune language model

beniz commented Sep 25, 2019 • edited Loading

beniz commented Sep 25, 2019

beniz commented Sep 25, 2019 • edited Loading

beniz commented Sep 25, 2019

beniz commented Sep 25, 2019 • edited Loading

Bycob commented Sep 25, 2019

Bycob commented Sep 25, 2019 • edited Loading

Bycob commented Sep 10, 2019 •

edited by fantes

Loading

Bycob commented Sep 10, 2019 •

edited

Loading

Bycob commented Sep 10, 2019 •

edited by fantes

Loading

Bycob commented Sep 17, 2019 •

edited by fantes

Loading

beniz commented Sep 25, 2019 •

edited

Loading

beniz commented Sep 25, 2019 •

edited

Loading

beniz commented Sep 25, 2019 •

edited

Loading

Bycob commented Sep 25, 2019 •

edited

Loading