-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CamembertForCausalLM #6577
CamembertForCausalLM #6577
Conversation
@patrickvonplaten didn't add any tests since it subclasses |
Codecov Report
@@ Coverage Diff @@
## master #6577 +/- ##
==========================================
+ Coverage 79.18% 80.33% +1.14%
==========================================
Files 156 156
Lines 28129 28132 +3
==========================================
+ Hits 22275 22599 +324
+ Misses 5854 5533 -321
Continue to review full report at Codecov.
|
hi @patrickvonplaten , could you take a look ? Thanks ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little curious that why this works? Camembert and Roberta are trained by masking so technically it's not very good for casual language modeling, right?
Yes, right. But this intended for the |
Thanks for the explanation! Yes it makes perfect sense to use these models for seq2seq but I still think maybe we should add a note somewhere in the document to explain this? |
Definitely! Will update the docs to explicitly state that this is intended for seq2seq and might not perform well on just causal modelling. Will also link the paper in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks @patil-suraj !
Btw, @patil-suraj did you start doing some Roberta2Roberta experiments? Wanted to start running some experiments next week - wanted to check if you already had some interesting results |
@patrickvonplaten Just started one experiment for qg and going to run cnn/dm after that, will let you know the results. |
In my experiments with bert2bert, I used the same token for encoder bos and decoder bos, but it's up to you! |
Okay, Thanks Patrick! |
* added CamembertForCausalLM * add in __init__ and auto model * style * doc
This PR adds
CamembertForCausalLM
by subclassingRobertaForCausalLM
, so that it can be used with theEncoderDecoderModel
.@patrickvonplaten