Add DistilRoberta Model to OSS (cherry picked commit) #1998

rshraga · 2022-12-01T15:48:10Z

Summary:
This diff adds a DistilRoberta to torchtext oss

This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base

The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739

Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md) {F806809901}
DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller.

Reviewed By: Nayef211

Differential Revision: D41590601

fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6

Summary: This diff adds a DistilRoberta to torchtext oss This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739 Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md) {F806809901} DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller. Reviewed By: Nayef211 Differential Revision: D41590601 fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6

rshraga · 2022-12-01T17:42:49Z

check failures look unrelated

joecummings · 2022-12-01T19:26:18Z

I think you need to rebase on main. The failures are related to a fix that went in yesterday @rshraga .

Summary: This diff adds a DistilRoberta to torchtext oss This model is a distilled version of the full Roberta Base model. Weights for this model are taken from HF https://huggingface.co/distilroberta-base The state dict is loaded and modified to work with the internal Roberta implementation here: https://www.internalfb.com/intern/anp/view/?id=2794739 Comparison of DistilRoberta to Roberta-base on the GLUE benchmark (as reported here https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md) {F806809901} DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE while being twice faster and 35% smaller. Reviewed By: Nayef211 Differential Revision: D41590601 fbshipit-source-id: 394d10c45bbee5d2e71e14e30edf9b1a9d9380e6

…berta

Nayef211

LGTM!

facebook-github-bot added the cla signed label Dec 1, 2022

rshraga requested a review from Nayef211 December 1, 2022 16:39

rshraga requested review from abhinavarora and joecummings December 1, 2022 18:26

Roman Shraga and others added 2 commits December 1, 2022 14:58

Merge branch 'distilroberta' of github.com:pytorch/text into distilro…

dd96be0

…berta

Nayef211 approved these changes Dec 1, 2022

View reviewed changes

rshraga merged commit 1020fae into main Dec 1, 2022

rshraga deleted the distilroberta branch December 1, 2022 21:54

Provide feedback