Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training Scripts masked LM #263

Open
friesel opened this issue Aug 10, 2021 · 5 comments
Open

Training Scripts masked LM #263

friesel opened this issue Aug 10, 2021 · 5 comments

Comments

@friesel
Copy link

friesel commented Aug 10, 2021

Do you intend to publish the training scrips for the masked LM as well?

@diegolascasas
Copy link
Contributor

Hi, can you specify which project you're directing your question to?

@friesel
Copy link
Author

friesel commented Aug 11, 2021

Sorry, my question is to the Perceiver IO-project-team.

In the NLP-world often the pretrained models are just english or "all the worlds languages". Many users however need inference in non-english languages and have 1 or 2 GPUs rather than TPU-pods, so for them it's most efficient to pretrain only in the language you actually need inference in. So both for pretraining and finetuning it'd be great to have the scripts you used in your pretraining of the masked LM available.

Thx

@fding
Copy link
Collaborator

fding commented Aug 12, 2021

Hi, thanks for your interest in Perceiver IO. We do not plan on open sourcing the training scripts for the masked LM, because the script is heavily tied to our internal infrastructure for training these models at scale. We do have an example training pipeline for ImageNet released as well as the exact configuration we used for language modeling from bytes (in the language modeling colab), which hopefully would be of use if you wish to train a new language model from scratch for other languages.

Do let us know if you have any further questions or if you encounter any issues trying to replicate our work!

@friesel
Copy link
Author

friesel commented Aug 16, 2021

Thx for the orientation. I will then get my head around the ImageNet-pipeline and try to adapt that to the NLP case.

@codedecde
Copy link

Hi @fding
Would it be possible to share some of the tensorboard logs for the Byte level LM pretraining and/or specifics on what the final MLM loss the models converge to(something similar to google-research/electra#3)? I am trying to replicate the Byte level experiments, so these logs would be really useful as a reference.
Thank you !

@diegolascasas diegolascasas removed their assignment Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@fding @diegolascasas @codedecde @friesel and others