Skip to content

Commit

Permalink
Adds a note to resize the token embedding matrix when adding special … (
Browse files Browse the repository at this point in the history
huggingface#11120)

* Adds a note to resize the token embedding matrix when adding special tokens

* Remove superfluous space
  • Loading branch information
LysandreJik authored and Iwontbecreative committed Jul 15, 2021
1 parent f9f7753 commit 2ab5187
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion src/transformers/tokenization_utils_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -825,7 +825,13 @@ def add_special_tokens(self, special_tokens_dict: Dict[str, Union[str, AddedToke
special tokens are NOT in the vocabulary, they are added to it (indexed starting from the last index of the
current vocabulary).
Using : obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
.. Note::
When adding new tokens to the vocabulary, you should make sure to also resize the token embedding matrix of
the model so that its embedding matrix matches the tokenizer.
In order to do that, please use the :meth:`~transformers.PreTrainedModel.resize_token_embeddings` method.
Using :obj:`add_special_tokens` will ensure your special tokens can be used in several ways:
- Special tokens are carefully handled by the tokenizer (they are never split).
- You can easily refer to special tokens using tokenizer class attributes like :obj:`tokenizer.cls_token`. This
Expand Down

0 comments on commit 2ab5187

Please sign in to comment.