Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Encoder Decoder to GPT-2 #4961

Closed
satwikkottur opened this issue Jun 12, 2020 · 14 comments
Closed

Extending Encoder Decoder to GPT-2 #4961

satwikkottur opened this issue Jun 12, 2020 · 14 comments
Assignees
Labels

Comments

@satwikkottur
Copy link

satwikkottur commented Jun 12, 2020

Adding GPT2 initialization for EncoderDecoder model as pointed out in the issue below.

Currently, only Bert works as a decoder. We might add GPT2 in a couple of weeks. Note that no model has cross-attention layers if it is not already an encoder-decoder model (like Bart or T5) and in this case it does not make sense to use the encoder-decoder wrapper. The model is initialized with random weights for the cross attention layers which will have to be fine-tuned. I agree, that this should be made clearer in the documentation!

Originally posted by @patrickvonplaten in #4517 (comment)

@patrickvonplaten patrickvonplaten self-assigned this Jun 12, 2020
@patrickvonplaten
Copy link
Contributor

It's on the roadmap :-)

@satwikkottur
Copy link
Author

Thank you! Look forward to it :)

@djw1809
Copy link

djw1809 commented Jun 17, 2020

Hi - I've actually been working on this myself the past couple days, should I submit a PR when finished?

@patrickvonplaten
Copy link
Contributor

That'd be great!

@djw1809
Copy link

djw1809 commented Jun 21, 2020

Will do - likely sometime this week.

@MaveriQ
Copy link

MaveriQ commented Jul 2, 2020

@djw1809 Any update on the PR? :)

@iliemihai
Copy link

@patrickvonplaten Hello Patrick, I am watching with much interest EncodeDecoder from transformers :) . Any updates on supporting GPT2 with EncodeDecoder ?

@djw1809
Copy link

djw1809 commented Jul 8, 2020 via email

@patrickvonplaten
Copy link
Contributor

@djw1809 - also feel free to already open a PR with unfinished code yet so that I can take a look early on and help you :-)

@patrickvonplaten
Copy link
Contributor

Working on it now. Also linking this PR: #4483

@Squire-tomsk
Copy link

Squire-tomsk commented Aug 20, 2020

@patrickvonplaten Hello Patrick.
As I see from 1d6e71e current cross attention implimentation assume that encoder have same hidden size as GPT-2. I have encoder with hidden size 512 and want to combine it with GPT-2 medium with hidden size 1024. I have done it by Fairseq code and now want to do same by Huggingface. Could you update your solution to support any suitable encoder hidden space size?

@patrickvonplaten
Copy link
Contributor

Hey @Squire-tomsk,

I see what you mean - this would mean to add a new config param for each model that has cross-attention...is this common practice? Would be great if you could open a new issue for that :-)

@Squire-tomsk
Copy link

Done #6645

@stale
Copy link

stale bot commented Oct 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 21, 2020
@stale stale bot closed this as completed Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants