Extending Encoder Decoder to GPT-2 #4961

satwikkottur · 2020-06-12T17:40:51Z

Adding GPT2 initialization for EncoderDecoder model as pointed out in the issue below.

Currently, only Bert works as a decoder. We might add GPT2 in a couple of weeks. Note that no model has cross-attention layers if it is not already an encoder-decoder model (like Bart or T5) and in this case it does not make sense to use the encoder-decoder wrapper. The model is initialized with random weights for the cross attention layers which will have to be fine-tuned. I agree, that this should be made clearer in the documentation!

Originally posted by @patrickvonplaten in #4517 (comment)

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-06-12T17:41:48Z

It's on the roadmap :-)

satwikkottur · 2020-06-12T17:48:00Z

Thank you! Look forward to it :)

djw1809 · 2020-06-17T21:46:16Z

Hi - I've actually been working on this myself the past couple days, should I submit a PR when finished?

patrickvonplaten · 2020-06-19T15:41:02Z

That'd be great!

djw1809 · 2020-06-21T20:37:19Z

Will do - likely sometime this week.

MaveriQ · 2020-07-02T07:25:33Z

@djw1809 Any update on the PR? :)

iliemihai · 2020-07-07T22:32:11Z

@patrickvonplaten Hello Patrick, I am watching with much interest EncodeDecoder from transformers :) . Any updates on supporting GPT2 with EncodeDecoder ?

djw1809 · 2020-07-08T19:20:04Z

Got sidetracked with other research - coming back to it in several days, working on my end, just need to play nice with the rest of the repo.

…

On Tue, Jul 7, 2020 at 3:32 PM Mihai Ilie ***@***.***> wrote: @patrickvonplaten <https://github.com/patrickvonplaten> Hello Patrick, I am watching with much interest EncodeDecoder from transformers :) . Any updates on supporting GPT2 with EncodeDecoder ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4961 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AG3PODZXYPBB33F4CBSNZLDR2OO7VANCNFSM4N4QTZQA> .

-- Dylan Weber, Research Assistant | PhD Candidate School of Math and Statistical Sciences WXLR642/BYENG593 Arizona State University

patrickvonplaten · 2020-07-10T13:59:53Z

@djw1809 - also feel free to already open a PR with unfinished code yet so that I can take a look early on and help you :-)

patrickvonplaten · 2020-08-11T14:51:00Z

Working on it now. Also linking this PR: #4483

Squire-tomsk · 2020-08-20T07:58:42Z

@patrickvonplaten Hello Patrick.
As I see from 1d6e71e current cross attention implimentation assume that encoder have same hidden size as GPT-2. I have encoder with hidden size 512 and want to combine it with GPT-2 medium with hidden size 1024. I have done it by Fairseq code and now want to do same by Huggingface. Could you update your solution to support any suitable encoder hidden space size?

patrickvonplaten · 2020-08-20T17:39:57Z

Hey @Squire-tomsk,

I see what you mean - this would mean to add a new config param for each model that has cross-attention...is this common practice? Would be great if you could open a new issue for that :-)

Squire-tomsk · 2020-08-21T15:31:05Z

Done #6645

stale · 2020-10-21T03:27:53Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

patrickvonplaten self-assigned this Jun 12, 2020

stale bot added the wontfix label Oct 21, 2020

stale bot closed this as completed Nov 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending Encoder Decoder to GPT-2 #4961

Extending Encoder Decoder to GPT-2 #4961

satwikkottur commented Jun 12, 2020 •

edited

Loading

patrickvonplaten commented Jun 12, 2020

satwikkottur commented Jun 12, 2020

djw1809 commented Jun 17, 2020 •

edited

Loading

patrickvonplaten commented Jun 19, 2020

djw1809 commented Jun 21, 2020

MaveriQ commented Jul 2, 2020

iliemihai commented Jul 7, 2020

djw1809 commented Jul 8, 2020 via email

patrickvonplaten commented Jul 10, 2020

patrickvonplaten commented Aug 11, 2020

Squire-tomsk commented Aug 20, 2020 •

edited

Loading

patrickvonplaten commented Aug 20, 2020

Squire-tomsk commented Aug 21, 2020

stale bot commented Oct 21, 2020

Extending Encoder Decoder to GPT-2 #4961

Extending Encoder Decoder to GPT-2 #4961

Comments

satwikkottur commented Jun 12, 2020 • edited Loading

patrickvonplaten commented Jun 12, 2020

satwikkottur commented Jun 12, 2020

djw1809 commented Jun 17, 2020 • edited Loading

patrickvonplaten commented Jun 19, 2020

djw1809 commented Jun 21, 2020

MaveriQ commented Jul 2, 2020

iliemihai commented Jul 7, 2020

djw1809 commented Jul 8, 2020 via email

patrickvonplaten commented Jul 10, 2020

patrickvonplaten commented Aug 11, 2020

Squire-tomsk commented Aug 20, 2020 • edited Loading

patrickvonplaten commented Aug 20, 2020

Squire-tomsk commented Aug 21, 2020

stale bot commented Oct 21, 2020

satwikkottur commented Jun 12, 2020 •

edited

Loading

djw1809 commented Jun 17, 2020 •

edited

Loading

Squire-tomsk commented Aug 20, 2020 •

edited

Loading