Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] LT-SFT and enc-dec models? #2

Open
adamwawrzynski opened this issue Jun 28, 2022 · 3 comments
Open

[Q] LT-SFT and enc-dec models? #2

adamwawrzynski opened this issue Jun 28, 2022 · 3 comments

Comments

@adamwawrzynski
Copy link

adamwawrzynski commented Jun 28, 2022

I'm wondering if this method should (theoretically) work with enc-dec models? Have You tried to train those models with code from this repository? I'm interested in utilizing this approach with T5 model.

@AlanAnsell
Copy link
Collaborator

Hi Adam, I've done some experiments on BART with LT-SFT and I can confirm that it works, so I'm pretty sure T5 should work as well. I think you should be able to use LotteryTicketSparseFineTuner without modification, although the boilerplate code in the example scripts will likely require some adjustment for generative models. It's important to note that as with the BERT style models, you should generally decouple the input and output embedding matrices and freeze the output embeddings to achieve good performance.

@adamwawrzynski
Copy link
Author

@AlanAnsell thank You for quick reply. Could You share scripts with BART experiments? It would be great starting point for further experimentation and adaptation for T5 architecture.

@AlanAnsell
Copy link
Collaborator

Unfortunately I can't share those experiments with you right now, but I generally expect that adaptation shouldn't be too difficult, e.g. for BART I replaced DataCollatorForLanguageModeling with DataCollatorForDenoisingTasks I found here: https://github.com/morganmcg1/rotobart/blob/main/data_collator.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants