Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the documentation checkpoint for xlm-roberta-xl #28567

Merged
merged 2 commits into from
Jan 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@

logger = logging.get_logger(__name__)

_CHECKPOINT_FOR_DOC = "xlm-roberta-xlarge"
_CHECKPOINT_FOR_DOC = "facebook/xlm-roberta-xl"
_CONFIG_FOR_DOC = "XLMRobertaXLConfig"

XLM_ROBERTA_XL_PRETRAINED_MODEL_ARCHIVE_LIST = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, this XLM_ROBERTA_XL_PRETRAINED_MODEL_ARCHIVE_LIST list already has the valid checkpoint name, and the old invalid checkpoint is not in the list. If the CI tests checked whether _CHECKPOINT_FOR_DOC in *_ARCHIVE_LIST for each model then this kind of typo would be prevented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add a check in this or a follow up PR! I'm not 100% sure this will hold for all the models in the library - there might be some special cases when a different checkpoint is used for the sake of a doc example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wasn't sure if I should change this specific model's checks since all 100+ models follow a coding template.

I actually went through every transformers text model over the last week (to add them to ONNX TurnkeyML and ONNX Model Zoo) and they all had correct checkpoint references except for this one. I would be interested in adding the check described above in a future PR if this kind of problem became more prevalent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks for all of your work on this!

Expand Down Expand Up @@ -653,7 +653,7 @@ def _init_weights(self, module):


@add_start_docstrings(
"The bare XLM-RoBERTa-xlarge Model transformer outputting raw hidden-states without any specific head on top.",
"The bare XLM-RoBERTa-XL Model transformer outputting raw hidden-states without any specific head on top.",
XLM_ROBERTA_XL_START_DOCSTRING,
)
class XLMRobertaXLModel(XLMRobertaXLPreTrainedModel):
Expand Down Expand Up @@ -833,7 +833,7 @@ def forward(


@add_start_docstrings(
"""XLM-RoBERTa-xlarge Model with a `language modeling` head on top for CLM fine-tuning.""",
"""XLM-RoBERTa-XL Model with a `language modeling` head on top for CLM fine-tuning.""",
XLM_ROBERTA_XL_START_DOCSTRING,
)
class XLMRobertaXLForCausalLM(XLMRobertaXLPreTrainedModel):
Expand Down Expand Up @@ -990,7 +990,7 @@ def _reorder_cache(self, past_key_values, beam_idx):


@add_start_docstrings(
"""XLM-RoBERTa-xlarge Model with a `language modeling` head on top.""", XLM_ROBERTA_XL_START_DOCSTRING
"""XLM-RoBERTa-XL Model with a `language modeling` head on top.""", XLM_ROBERTA_XL_START_DOCSTRING
)
class XLMRobertaXLForMaskedLM(XLMRobertaXLPreTrainedModel):
_tied_weights_keys = ["lm_head.decoder.weight", "lm_head.decoder.bias"]
Expand Down Expand Up @@ -1081,7 +1081,7 @@ def forward(


class XLMRobertaXLLMHead(nn.Module):
"""XLM-Roberta-xlarge Head for masked language modeling."""
"""XLM-RoBERTa-XL Head for masked language modeling."""

def __init__(self, config):
super().__init__()
Expand Down Expand Up @@ -1109,7 +1109,7 @@ def _tie_weights(self):

@add_start_docstrings(
"""
XLM-RoBERTa-xlarge Model transformer with a sequence classification/regression head on top (a linear layer on top
XLM-RoBERTa-XL Model transformer with a sequence classification/regression head on top (a linear layer on top
of the pooled output) e.g. for GLUE tasks.
""",
XLM_ROBERTA_XL_START_DOCSTRING,
Expand Down Expand Up @@ -1203,7 +1203,7 @@ def forward(

@add_start_docstrings(
"""
XLM-Roberta-xlarge Model with a multiple choice classification head on top (a linear layer on top of the pooled
XLM-RoBERTa-XL Model with a multiple choice classification head on top (a linear layer on top of the pooled
output and a softmax) e.g. for RocStories/SWAG tasks.
""",
XLM_ROBERTA_XL_START_DOCSTRING,
Expand Down Expand Up @@ -1294,7 +1294,7 @@ def forward(

@add_start_docstrings(
"""
XLM-Roberta-xlarge Model with a token classification head on top (a linear layer on top of the hidden-states
XLM-RoBERTa-XL Model with a token classification head on top (a linear layer on top of the hidden-states
output) e.g. for Named-Entity-Recognition (NER) tasks.
""",
XLM_ROBERTA_XL_START_DOCSTRING,
Expand Down Expand Up @@ -1405,7 +1405,7 @@ def forward(self, features, **kwargs):

@add_start_docstrings(
"""
XLM-Roberta-xlarge Model with a span classification head on top for extractive question-answering tasks like SQuAD
XLM-RoBERTa-XL Model with a span classification head on top for extractive question-answering tasks like SQuAD
(a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
""",
XLM_ROBERTA_XL_START_DOCSTRING,
Expand Down
Loading