-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bert sequence labelling #78
Conversation
I will also add an optional CRF activation layer as alternative to the current softmax layer (which is not good for sequence labelling). |
CoNLL 2003 NER
|
CoNLL 2003 NER
|
After painful tuning the hyperparameters on the dev set, this is the best I get with BERT-base+CRF CoNLL 2003 NER
|
nerTagger.py
Outdated
@@ -422,6 +531,10 @@ def annotate(output_format, | |||
|
|||
|
|||
if __name__ == "__main__": | |||
|
|||
architectures = ['BidLSTM_CRF', 'BidLSTM_CNN_CRF', 'BidLSTM_CNN_CRF', 'BidGRU_CRF', 'BidLSTM_CNN', 'BidLSTM_CRF_CASING', | |||
'bert-base-en', 'bert-base-en', 'scibert', 'biobert'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed there are two 'bert-base-en'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oups the second one should be bert-large-en
!
I'm adding the results for the quantities model and for the superconductor model, (ran with the base-bert-en) using bert or scibert is actually not bringing any improvement. Quantities
Superconductors:
|
Another thing I noticed, here is that the config model of bert-base-en has some strange values:
See below:
|
Thanks @lfoppiano ! What is the "normal" you're comparing with? For sequence labeling, BERT base gives indeed results similar with BidLSTM-CRF with Gloves in general (but only after tuning parameters), so this is in line with what is observed usually. I saw with the recognition of software mentions that SciBERT was much better on scientific text than the normal BERT, but it is still significantly lower scores than ELMo+BidLSTM-CRF (minus 1-2 f-score). On the contrary, For classification, SciBERT gives the best result on "scholar" texts. About the config,
|
Add BERT architecture for sequence labelling.
As noted here the original CoNLL-2003 NER results reported by the Google Research paper are not reproducible, by far, and they probably reported token-level metrics instead of entity-level metrics (as done by conlleval and previous works). In general, generic transformer pre-trained models appear to perform poorly for information extraction and NER tasks (both with fine-tuning or contextual embedding features), as compared to ELMo.
Still it's a good exercise and using
scibert
/biobert
for scientific text achieves very good and faster results, even compared to ELMo+BidLSTM-CRF.Similarly as the usage of BERT for text classification in DeLFT, we use a data generator to feed BERT when predicting (instead of the file-based input function of the original BERT implementation), and avoid reloading the whole TF graph for each batch. This was possible by using the FastPredict class in
model.py
, which is adapted from https://github.com/marcsto/rl/blob/master/src/fast_predict2.py by Marc Stogaitis.Using a nvidia GeForce 1080 GPU, we can process around 1000 tokens per second with this approach, which is 3 times faster than BiLSTM-CRF+ELMo, but 30 times slower than with a BiLSTM-CRF (and 100 times slower than what we get with a Wapiti CRF model on a modern workstation ;).