Update to transformers 2.3.0 & Add ALBERT #990

HaokunLiu · 2020-01-11T02:25:53Z

Important notice

You will need to change PYTORCH_PRETRAINED_BERT_CACHE to HUGGINGFACE_TRANSFORMERS_CACHE in your own env setting.

Updates

This PR update from pytorch_transformers 1.0 to transformers 2.3. The major changes are

Add albert to input modules
Rename mentions of "pytorch_pretrained_bert", "pytorch_transformers" to "transformers". Sometimes the name "huggingface transformers" is used to avoid ambiguity. I only wish they do not transform their name anymore.
As Adding tokenizer alignment function #953 pointed out, due to huggingface reverted their bytebpe tokenizer, our temporary fix does not work. This PR changes the alignment function in qa.py back to using TokenAligner.

Related Issues

#972 #920 #730

Other

Transformers 2.3 introduced a new feature: AutoModel, AutoTokenizer etc, this may be used to simplify some code. But it is not very necessary right now, so I didn't do it.

pep8speaks · 2020-01-11T02:26:01Z

Hello @HaokunLiu! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file jiant/huggingface_transformers_interface/__init__.py:

Line 10:101: E501 line too long (105 > 100 characters)

In the file jiant/huggingface_transformers_interface/modules.py:

Line 117:101: E501 line too long (117 > 100 characters)
Line 131:101: E501 line too long (114 > 100 characters)
Line 133:101: E501 line too long (124 > 100 characters)
Line 134:101: E501 line too long (120 > 100 characters)
Line 173:101: E501 line too long (110 > 100 characters)
Line 175:101: E501 line too long (104 > 100 characters)
Line 178:101: E501 line too long (109 > 100 characters)

In the file jiant/tasks/qa.py:

Line 773:101: E501 line too long (104 > 100 characters)

In the file jiant/tasks/tasks.py:

Line 2729:99: W291 trailing whitespace

You can repair most issues by installing black and running: black -l 100 ./*. If you contribute often, have a look at the 'Contributing' section of the README for instructions on doing this automatically.

Comment last updated at 2020-01-24 22:15:09 UTC

HaokunLiu · 2020-01-14T16:15:58Z

tests/test_retokenize.py

@@ -57,15 +57,15 @@ def test_moses(self):
 ]

 aligner_fn = retokenize.get_aligner_fn("transfo-xl-wt103")
- tas, tokens = zip(*(aligner_fn(sent) for sent in self.text))
- tas, tokens = list(tas), list(tokens)
+ token_aligners, tokens = zip(*(aligner_fn(sent) for sent in self.text))


I may have written this variable name in the first place, but I found it hard to understand now. So I change it to the full name.

HaokunLiu · 2020-01-15T15:00:03Z

So it turned out ALBERT really is better than RoBERTa. I run some experiments on cola and rte.

task	roberta-large	albert-xxlarge-v1	albert-xxlarge-v2
cola	0.685	0.695	0.701
rte	0.862	0.888	0.906

I didn't use the exact same hyperparameter. But the result is close to their dev set results on the paper.

sleepinyourhat · 2020-01-15T15:50:32Z

Great! How hard was it to run albert-xxlarge? Did you have to use smaller batch sizes than for roberta-large?

…

On Wed, Jan 15, 2020 at 10:00 AM Haokun Liu ***@***.***> wrote: So it turned out ALBERT really is better than RoBERTa. I run some experiments on cola and rte. | task | roberta-large | albert-xxlarge-v1 | albert-xxlarge-v2 | |------|---------------|-------------------|-------------------| | cola | 0.685 | 0.695 | 0.701 | | rte | 0.862 | 0.888 | 0.906 | I didn't use the exact same hyperparameter. But the result is already close to their dev set results on the paper. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nyu-2Dmll_jiant_pull_990-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAJZSWOPL5K6BC6ERCBS5YDQ54QHHA5CNFSM4KFPSL5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJATEHI-23issuecomment-2D574698013&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=sCzLyHdE8zgQwk2-sKwA1w&m=ACW_26To1ILze-7xVTqVDQT-Y1fafHMNQGxlpSAqBr8&s=yaMZ4q5CTG74o8g5OYHqmv-vRfh8VRitn61kws6aCJg&e=>, or unsubscribe <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAJZSWJ5DGQCPY56PJMWKLTQ54QHHANCNFSM4KFPSL5A&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=sCzLyHdE8zgQwk2-sKwA1w&m=ACW_26To1ILze-7xVTqVDQT-Y1fafHMNQGxlpSAqBr8&s=0F--szix3dfGhsC8TOf1VuzWD7yhqS9PzmkRoaaMLoA&e=> .

HaokunLiu · 2020-01-15T16:43:47Z

Great! How hard was it to run albert-xxlarge? Did you have to use smaller batch sizes than for roberta-large?
…
On Wed, Jan 15, 2020 at 10:00 AM Haokun Liu @.***> wrote: So it turned out ALBERT really is better than RoBERTa. I run some experiments on cola and rte. | task | roberta-large | albert-xxlarge-v1 | albert-xxlarge-v2 | |------|---------------|-------------------|-------------------| | cola | 0.685 | 0.695 | 0.701 | | rte | 0.862 | 0.888 | 0.906 | I didn't use the exact same hyperparameter. But the result is already close to their dev set results on the paper. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_nyu-2Dmll_jiant_pull_990-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAJZSWOPL5K6BC6ERCBS5YDQ54QHHA5CNFSM4KFPSL5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJATEHI-23issuecomment-2D574698013&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=sCzLyHdE8zgQwk2-sKwA1w&m=ACW_26To1ILze-7xVTqVDQT-Y1fafHMNQGxlpSAqBr8&s=yaMZ4q5CTG74o8g5OYHqmv-vRfh8VRitn61kws6aCJg&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAJZSWJ5DGQCPY56PJMWKLTQ54QHHANCNFSM4KFPSL5A&d=DwMCaQ&c=slrrB7dE8n7gBJbeO0g-IQ&r=sCzLyHdE8zgQwk2-sKwA1w&m=ACW_26To1ILze-7xVTqVDQT-Y1fafHMNQGxlpSAqBr8&s=0F--szix3dfGhsC8TOf1VuzWD7yhqS9PzmkRoaaMLoA&e= .

Input of RTE and CoLA are fairly short, it does not cause any memory problem. But since albert-xxlarge have 4x the hidden size of roberta-large, and half number of layers. It will have to use smaller batch sizes, if the task has greater input length.

HaokunLiu · 2020-01-15T18:28:30Z

Comparing RoBERTa tokenizer in pytorch_transformers ('ĠBerlin', 'Ġand', 'ĠMunich') and RoBERTa tokenizer in transformers ('Ber', 'lin', 'Ġand', 'ĠMunich') on QAMR.

pytorch_transformers	transformers
0.730	0.726

Yes, the new one seems very counter-intuitive, but that's what Huggingface finally settled on. Some people found the new one marginally better on NER . huggingface/transformers#1196

sleepinyourhat · 2020-01-15T18:44:58Z

Taking a look now...

pyeres · 2020-01-17T21:21:48Z

@pyeres @pruksmhc, could either of you give the main review?

@HaokunLiu, I'm starting to review now. Here are the validation items we discussed:

Performance regression test (performance on some core tasks not reduced w/ these changes)
Benchmark test — Albert performance in jiant is comparable to paper
Test coverage on retokenization code

pyeres · 2020-01-17T22:04:54Z

jiant/huggingface_transformers_interface/__init__.py

+
+# All the supported input_module from huggingface transformers
+# input_modules mapped to the same string share vocabulary
+input_module_to_pretokenized = {


input_module_to_pretokenized -> transformer_input_module_to_tokenizer_id

HaokunLiu · 2020-01-20T03:56:08Z

Actually, don't merge this now. I found our current implementation of many tasks use the assumption that tokenizing each word independently and concatenating them will give you the same result as tokenizing the full sentence. The tasks may include CCG, ReCoRD, WiC. I need to run some more tests, and possibly make some changes to these tasks.

HaokunLiu · 2020-01-24T19:38:44Z

Task	Update to transformers	Final baseline in Taskmaster	Initial hyper-parameter search in Taskmaster
Target
commitbank	1	0.991	1
copa	0.94	0.86	0.93
wsc	0.692	0.673	0.798
rte	0.866	0.835	0.862
multirc	0.628	0.474	0.574
wic	0.724	0.705	0.721
boolq	0.879	0.866	0.877
commonsenseqa	0.735	0.74	0.734
cosmosqa	0.814	0.819	0.817
record	0.863	0.86	0.898
INTERM
qamr	0.726	N/A	0.735
scitail	0.976	N/A	0.972
social-iqa	0.766	N/A	0.729
ccg	0.96	N/A	0.96
hellaswag	0.851	N/A	0.856
qasrl	0.664	N/A	0.675
sst	0.961	N/A	0.968
qqp	0.903	N/A	0.898
mnli	0.896	N/A	0.901

The first column is the updated roberta-large model, all the tasks use lr=5e-6 and dropout=0.2, the second column is the average result from three random seeds using the previously found "optimal" learning rate and dropout rate, the third column is the best result we got when doing hyper-parameter searching.

Most results are on par with our previous result. Some are marginally lower, I think it's understandable, since it's just using a single hyper-parameter setting. Except for WSC, which seems very unstable.

pyeres · 2020-01-24T21:59:08Z

Most results are on par with our previous result. Some are marginally lower, I think it's understandable, since it's just using a single hyper-parameter setting. Except for WSC, which seems very unstable.

This looks OK to me — on tasks where performance with this updated transformers code is below the level reported in your "Initial hyper-parameter search in Taskmaster" column, performance with your updated transformers code is above (or very close to) the performance in your "Final baseline in Taskmaster" column (and I understand that these final baselines are the result of multiple runs).

pyeres

Looks good to me — thanks for providing the results of your performance and regression tests.

pyeres · 2020-01-28T14:50:32Z

Thanks @HaokunLiu for running the additional performance validations. Merging.

* fix roberta tokenization error * update transformers * update alignment func * trim input_module * update lm head * update albert special tokens * input_module_to_pretokenized -> transformer_input_module_to_tokenizer_id * update ccg alignment * fix wic retokenize * update wic docstring, remove unnecessary condition * refactor record task to avoid tokenization problem Co-authored-by: Sam Bowman <bowman@nyu.edu>

HaokunLiu added 8 commits December 27, 2019 21:57

fix roberta tokenization error

c18840d

format

77fcd2f

update transformers

2cdca62

update alignment func

2c2750a

trim input_module

3d8b20e

rename

fe14851

name change

c506569

more name change

b6b489b

HaokunLiu added 8 commits January 10, 2020 21:28

format

89d958b

update from master

8f0e9c9

format

2dffcf1

debug

d09c900

debug

0d1b096

update lm head

8c99a96

debug qa retokenize func

1d6adc1

debug retokenization in qa

25008d5

HaokunLiu commented Jan 14, 2020

View reviewed changes

debug qa retokenize

ace5631

HaokunLiu marked this pull request as ready for review January 15, 2020 18:29

HaokunLiu requested review from iftenney, pruksmhc, sleepinyourhat and W4ngatang as code owners January 15, 2020 18:29

sleepinyourhat added the 0.x.0 release on fix Put out a new 0.x.0 release when this is fixed. label Jan 15, 2020

update albert special tokens

8513afd

pyeres reviewed Jan 17, 2020

View reviewed changes

HaokunLiu added 5 commits January 17, 2020 18:04

clean up

a6a6d70

input_module_to_pretokenized -> transformer_input_module_to_tokenizer_id

ca2ca2f

update ccg alignment

cb043dd

debug ccg preprocess

c065d6b

clean up

683addc

HaokunLiu added 5 commits January 20, 2020 02:57

fix wic retokenize

4099c7a

debug wic and record

de19653

update wic docstring, remove unnecessary condition

963ed1b

refactor record task to avoid tokenization problem

34d2e41

debug

0b1448b

pyeres approved these changes Jan 24, 2020

View reviewed changes

Merge branch 'master' into update-transformers

6aab721

pyeres merged commit 4a9b058 into master Jan 28, 2020

pyeres deleted the update-transformers branch January 28, 2020 14:51

pyeres mentioned this pull request Jan 30, 2020

Fix Dataparallel Metric Calculation #992

Merged

zphang mentioned this pull request Apr 10, 2020

Updating to Transformers v2.6.0 #1059

Merged

sleepinyourhat mentioned this pull request Apr 11, 2020

Adding Sentence Order Prediction #1061

Merged

pyeres mentioned this pull request Jul 5, 2020

Update WSC task to use token aligner jiant-dev/jiant-dev#98

Merged

This was referenced Sep 17, 2020

[CLOSED] Update to transformers 2.3.0 & Add ALBERT nyu-mll/jiant-v1-legacy#990

Closed

[CLOSED] Fix Dataparallel Metric Calculation nyu-mll/jiant-v1-legacy#992

Closed

[CLOSED] Updating to Transformers v2.6.0 nyu-mll/jiant-v1-legacy#1059

Closed

jeswan added the jiant-v1-legacy Relevant to versions <= v1.3.2 label Sep 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to transformers 2.3.0 & Add ALBERT #990

Update to transformers 2.3.0 & Add ALBERT #990

HaokunLiu commented Jan 11, 2020 •

edited

Loading

pep8speaks commented Jan 11, 2020 •

edited

Loading

HaokunLiu Jan 14, 2020 •

edited

Loading

HaokunLiu commented Jan 15, 2020 •

edited

Loading

sleepinyourhat commented Jan 15, 2020 via email

HaokunLiu commented Jan 15, 2020

HaokunLiu commented Jan 15, 2020

sleepinyourhat commented Jan 15, 2020

pyeres commented Jan 17, 2020 •

edited by HaokunLiu

Loading

pyeres Jan 17, 2020

HaokunLiu commented Jan 20, 2020

HaokunLiu commented Jan 24, 2020

pyeres commented Jan 24, 2020

pyeres left a comment

pyeres commented Jan 28, 2020

Update to transformers 2.3.0 & Add ALBERT #990

Update to transformers 2.3.0 & Add ALBERT #990

Conversation

HaokunLiu commented Jan 11, 2020 • edited Loading

Important notice

Updates

Related Issues

Other

pep8speaks commented Jan 11, 2020 • edited Loading

Comment last updated at 2020-01-24 22:15:09 UTC

HaokunLiu Jan 14, 2020 • edited Loading

Choose a reason for hiding this comment

HaokunLiu commented Jan 15, 2020 • edited Loading

sleepinyourhat commented Jan 15, 2020 via email

HaokunLiu commented Jan 15, 2020

HaokunLiu commented Jan 15, 2020

sleepinyourhat commented Jan 15, 2020

pyeres commented Jan 17, 2020 • edited by HaokunLiu Loading

pyeres Jan 17, 2020

Choose a reason for hiding this comment

HaokunLiu commented Jan 20, 2020

HaokunLiu commented Jan 24, 2020

pyeres commented Jan 24, 2020

pyeres left a comment

Choose a reason for hiding this comment

pyeres commented Jan 28, 2020

HaokunLiu commented Jan 11, 2020 •

edited

Loading

pep8speaks commented Jan 11, 2020 •

edited

Loading

HaokunLiu Jan 14, 2020 •

edited

Loading

HaokunLiu commented Jan 15, 2020 •

edited

Loading

pyeres commented Jan 17, 2020 •

edited by HaokunLiu

Loading