You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, thanks for the paper, lots of interesting insights, and for sharing the code. I wanted to try out how RoBERTa would perform when it is coupled with NSP. I know in the original paper, NSP was discarded, but wanted to see how NSP would work out for my use-case.
Did you try with RoBERTa NSP by any chance ? I see that the code defaults to masked LM. I'm not able to find any RoBERTa checkpoint trained on NSP objective on HF model-hub or fairseq repo. If not, I'd have to see how can I perform pre-training myself. Can you please share instructions on pre-training on Bookcorpus and wikipedia ? I see that the paper mentions it, so it will be helpful if you can share it.
Thanks again for the contribution.
The text was updated successfully, but these errors were encountered:
Hi,
Firstly, thanks for the paper, lots of interesting insights, and for sharing the code. I wanted to try out how RoBERTa would perform when it is coupled with NSP. I know in the original paper, NSP was discarded, but wanted to see how NSP would work out for my use-case.
Did you try with RoBERTa NSP by any chance ? I see that the code defaults to masked LM. I'm not able to find any RoBERTa checkpoint trained on NSP objective on HF model-hub or fairseq repo. If not, I'd have to see how can I perform pre-training myself. Can you please share instructions on pre-training on Bookcorpus and wikipedia ? I see that the paper mentions it, so it will be helpful if you can share it.
Thanks again for the contribution.
The text was updated successfully, but these errors were encountered: