Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a note to resize the token embedding matrix when adding special … #11120

Merged
merged 2 commits into from
Apr 7, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/transformers/tokenization_utils_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -825,7 +825,13 @@ def add_special_tokens(self, special_tokens_dict: Dict[str, Union[str, AddedToke
special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the
current vocabulary).

Using : obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
.. Note::
When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
the model so that its embedding matrix matches the tokenizer.

In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.

Using :obj:`add_special_tokens` will ensure your special tokens can be used in several ways:

- Special tokens are carefully handled by the tokenizer (they are never split).
- You can easily refer to special tokens using tokenizer class attributes like :obj:`tokenizer.cls_token`. This
Expand Down