CamembertForCausalLM #6577

patil-suraj · 2020-08-18T15:46:48Z

This PR adds CamembertForCausalLM by subclassing RobertaForCausalLM, so that it can be used with the EncoderDecoderModel.

@patrickvonplaten

patil-suraj · 2020-08-18T15:47:45Z

@patrickvonplaten didn't add any tests since it subclasses RobertaForCausalLM which is already tested.

codecov · 2020-08-18T15:53:10Z

Codecov Report

Merging #6577 into master will increase coverage by 1.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6577      +/-   ##
==========================================
+ Coverage   79.18%   80.33%   +1.14%     
==========================================
  Files         156      156              
  Lines       28129    28132       +3     
==========================================
+ Hits        22275    22599     +324     
+ Misses       5854     5533     -321

Impacted Files	Coverage Δ
src/transformers/__init__.py	`99.28% <ø> (ø)`
src/transformers/modeling_auto.py	`78.73% <ø> (ø)`
src/transformers/modeling_camembert.py	`100.00% <100.00%> (ø)`
src/transformers/tokenization_albert.py	`28.84% <0.00%> (-58.66%)`	⬇️
src/transformers/modeling_tf_bert.py	`69.06% <0.00%> (-29.32%)`	⬇️
src/transformers/modeling_tf_gpt2.py	`71.84% <0.00%> (-23.17%)`	⬇️
src/transformers/modeling_bert.py	`88.26% <0.00%> (-0.17%)`	⬇️
src/transformers/generation_tf_utils.py	`86.46% <0.00%> (+0.25%)`	⬆️
src/transformers/configuration_utils.py	`96.59% <0.00%> (+0.68%)`	⬆️
src/transformers/modeling_tf_utils.py	`87.29% <0.00%> (+0.97%)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12d7624...1e68a94. Read the comment docs.

patil-suraj · 2020-08-19T16:38:42Z

hi @patrickvonplaten , could you take a look ? Thanks !

JetRunner

I'm a little curious that why this works? Camembert and Roberta are trained by masking so technically it's not very good for casual language modeling, right?

patil-suraj · 2020-08-20T08:31:10Z

Yes, right. But this intended for the EncoderDecoder model class which can use pre-trained encoder for seq2seq tasks.
See PR #6411 and #6538 . Also Patrick shared this awesome paper which explores this idea and has shown some good results on seq2seq tasks. https://arxiv.org/pdf/1907.12461.pdf

JetRunner · 2020-08-20T08:51:03Z

Thanks for the explanation! Yes it makes perfect sense to use these models for seq2seq but I still think maybe we should add a note somewhere in the document to explain this?

patil-suraj · 2020-08-20T09:03:17Z

Definitely! Will update the docs to explicitly state that this is intended for seq2seq and might not perform well on just causal modelling. Will also link the paper in EncoderDecoder doc page, so people will know what to choose.

patrickvonplaten

Great, thanks @patil-suraj !

patrickvonplaten · 2020-08-21T11:52:49Z

Btw, @patil-suraj did you start doing some Roberta2Roberta experiments? Wanted to start running some experiments next week - wanted to check if you already had some interesting results

patil-suraj · 2020-08-21T15:18:56Z

@patrickvonplaten Just started one experiment for qg and going to run cnn/dm after that, will let you know the results.
Since Roberta has bos, what's the decoder_start_token_id for Roberta2Roberta, is it bos or pad token ?

patrickvonplaten · 2020-08-21T15:54:09Z

In my experiments with bert2bert, I used the same token for encoder bos and decoder bos, but it's up to you! bos makes more sense to me in the case of Roberta2Roberta.

patil-suraj · 2020-08-21T16:00:35Z

Okay, Thanks Patrick!

* added CamembertForCausalLM * add in __init__ and auto model * style * doc

patil-suraj added 4 commits August 18, 2020 20:54

added CamembertForCausalLM

2c9717f

add in __init__ and auto model

da0aa85

style

97f4436

doc

1e68a94

JetRunner reviewed Aug 20, 2020

View reviewed changes

patrickvonplaten approved these changes Aug 21, 2020

View reviewed changes

patrickvonplaten merged commit d0e42a7 into huggingface:master Aug 21, 2020

patil-suraj deleted the camembert-enc-dec branch August 21, 2020 15:19

Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020

CamembertForCausalLM (huggingface#6577)

41630cf

* added CamembertForCausalLM * add in __init__ and auto model * style * doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CamembertForCausalLM #6577

CamembertForCausalLM #6577

patil-suraj commented Aug 18, 2020

patil-suraj commented Aug 18, 2020

codecov bot commented Aug 18, 2020

patil-suraj commented Aug 19, 2020

JetRunner left a comment

patil-suraj commented Aug 20, 2020

JetRunner commented Aug 20, 2020 •

edited

Loading

patil-suraj commented Aug 20, 2020

patrickvonplaten left a comment

patrickvonplaten commented Aug 21, 2020

patil-suraj commented Aug 21, 2020

patrickvonplaten commented Aug 21, 2020

patil-suraj commented Aug 21, 2020

CamembertForCausalLM #6577

CamembertForCausalLM #6577

Conversation

patil-suraj commented Aug 18, 2020

patil-suraj commented Aug 18, 2020

codecov bot commented Aug 18, 2020

Codecov Report

patil-suraj commented Aug 19, 2020

JetRunner left a comment

Choose a reason for hiding this comment

patil-suraj commented Aug 20, 2020

JetRunner commented Aug 20, 2020 • edited Loading

patil-suraj commented Aug 20, 2020

patrickvonplaten left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Aug 21, 2020

patil-suraj commented Aug 21, 2020

patrickvonplaten commented Aug 21, 2020

patil-suraj commented Aug 21, 2020

JetRunner commented Aug 20, 2020 •

edited

Loading