Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CamembertForCausalLM #6577

Merged
merged 4 commits into from
Aug 21, 2020

Conversation

patil-suraj
Copy link
Contributor

This PR adds CamembertForCausalLM by subclassing RobertaForCausalLM, so that it can be used with the EncoderDecoderModel.

@patrickvonplaten

@patil-suraj
Copy link
Contributor Author

@patrickvonplaten didn't add any tests since it subclasses RobertaForCausalLM which is already tested.

@codecov
Copy link

codecov bot commented Aug 18, 2020

Codecov Report

Merging #6577 into master will increase coverage by 1.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6577      +/-   ##
==========================================
+ Coverage   79.18%   80.33%   +1.14%     
==========================================
  Files         156      156              
  Lines       28129    28132       +3     
==========================================
+ Hits        22275    22599     +324     
+ Misses       5854     5533     -321     
Impacted Files Coverage Δ
src/transformers/__init__.py 99.28% <ø> (ø)
src/transformers/modeling_auto.py 78.73% <ø> (ø)
src/transformers/modeling_camembert.py 100.00% <100.00%> (ø)
src/transformers/tokenization_albert.py 28.84% <0.00%> (-58.66%) ⬇️
src/transformers/modeling_tf_bert.py 69.06% <0.00%> (-29.32%) ⬇️
src/transformers/modeling_tf_gpt2.py 71.84% <0.00%> (-23.17%) ⬇️
src/transformers/modeling_bert.py 88.26% <0.00%> (-0.17%) ⬇️
src/transformers/generation_tf_utils.py 86.46% <0.00%> (+0.25%) ⬆️
src/transformers/configuration_utils.py 96.59% <0.00%> (+0.68%) ⬆️
src/transformers/modeling_tf_utils.py 87.29% <0.00%> (+0.97%) ⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12d7624...1e68a94. Read the comment docs.

@patil-suraj
Copy link
Contributor Author

hi @patrickvonplaten , could you take a look ? Thanks !

Copy link
Contributor

@JetRunner JetRunner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little curious that why this works? Camembert and Roberta are trained by masking so technically it's not very good for casual language modeling, right?

@patil-suraj
Copy link
Contributor Author

Yes, right. But this intended for the EncoderDecoder model class which can use pre-trained encoder for seq2seq tasks.
See PR #6411 and #6538 . Also Patrick shared this awesome paper which explores this idea and has shown some good results on seq2seq tasks. https://arxiv.org/pdf/1907.12461.pdf

@JetRunner
Copy link
Contributor

JetRunner commented Aug 20, 2020

Thanks for the explanation! Yes it makes perfect sense to use these models for seq2seq but I still think maybe we should add a note somewhere in the document to explain this?

@patil-suraj
Copy link
Contributor Author

Definitely! Will update the docs to explicitly state that this is intended for seq2seq and might not perform well on just causal modelling. Will also link the paper in EncoderDecoder doc page, so people will know what to choose.

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @patil-suraj !

@patrickvonplaten
Copy link
Contributor

Btw, @patil-suraj did you start doing some Roberta2Roberta experiments? Wanted to start running some experiments next week - wanted to check if you already had some interesting results

@patrickvonplaten patrickvonplaten merged commit d0e42a7 into huggingface:master Aug 21, 2020
@patil-suraj
Copy link
Contributor Author

@patrickvonplaten Just started one experiment for qg and going to run cnn/dm after that, will let you know the results.
Since Roberta has bos, what's the decoder_start_token_id for Roberta2Roberta, is it bos or pad token ?

@patil-suraj patil-suraj deleted the camembert-enc-dec branch August 21, 2020 15:19
@patrickvonplaten
Copy link
Contributor

In my experiments with bert2bert, I used the same token for encoder bos and decoder bos, but it's up to you! bos makes more sense to me in the case of Roberta2Roberta.

@patil-suraj
Copy link
Contributor Author

Okay, Thanks Patrick!

Zigur pushed a commit to Zigur/transformers that referenced this pull request Oct 26, 2020
* added CamembertForCausalLM

* add in __init__ and auto model

* style

* doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants