-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove static pretrained maps from the library's internals #29112
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -52,8 +43,6 @@ class {{cookiecutter.camelcase_modelname}}Tokenizer(BertTokenizer): | |||
|
|||
vocab_files_names = VOCAB_FILES_NAMES | |||
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP | |||
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES | |||
pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION | |||
|
|||
{%- elif cookiecutter.tokenizer_type == "Based on BART" %} | |||
from ...utils import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line with
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
"{{cookiecutter.checkpoint_identifier}}": 1024,
}
should also be removed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! This needs to be adapted as well as the "new model addition" script that relies on these. Also there are 5k tests failing that I need to fix before I un-draft it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good luck!
@@ -53,8 +44,6 @@ class {{cookiecutter.camelcase_modelname}}TokenizerFast(BertTokenizerFast): | |||
|
|||
vocab_files_names = VOCAB_FILES_NAMES | |||
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP | |||
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES | |||
pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION | |||
slow_tokenizer_class = {{cookiecutter.camelcase_modelname}}Tokenizer | |||
|
|||
{%- elif cookiecutter.tokenizer_type == "Based on BART" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also cleanup for bart based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and Standalone
Fixes distilgpt2 tokenization. Previously, we only used the fallback configuration if there was no `tokenizer_config.json` in the model repo. These files are now being added to some repos in the context of removing dependencies with transformers' internals, like this PR: huggingface/transformers#29112. But only keys removed from the hardcoded rules are being added to minimize potential breaking changes. We now use the fallback config if tokenizer_config.json exists, no tokenizer class is specified, and we do have a fallback config for this architecture.
Fixes distilgpt2 tokenization. Previously, we only used the fallback configuration if there was no `tokenizer_config.json` in the model repo. These files are now being added to some repos in the context of removing dependencies with transformers' internals, like this PR: huggingface/transformers#29112. But only keys removed from the hardcoded rules are being added to minimize potential breaking changes. We now use the fallback config if tokenizer_config.json exists, no tokenizer class is specified, and we do have a fallback config for this architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
casual G.O.A.T. = @LysandreJik
2ce0712
to
dbedf2b
Compare
Updated the base to move some of the refactor to that PR: #29534 |
e66a3f9
to
9dd29e4
Compare
dbedf2b
to
cec1eef
Compare
5f314cf
to
fbfb0e9
Compare
why deprecate vs remove? |
Deprecate for 2 months and then simply |
9e70d94
to
3be48a5
Compare
It should be good for review for the brave one @amyeroberts @ArthurZucker. Failing tests are unrelated and also failing on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO for me, we need to make sure the the test add some pre-trained models to test!
src/transformers/models/deprecated/retribert/tokenization_retribert_fast.py
Show resolved
Hide resolved
from ..deprecated._archive_maps import ( # noqa: F401, E402 | ||
DPR_CONTEXT_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402 | ||
DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402 | ||
DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ..deprecated._archive_maps import ( # noqa: F401, E402 | |
DPR_CONTEXT_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402 | |
DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402 | |
DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402 | |
) | |
from ..deprecated._archive_maps import DPR_CONTEXT_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST #fmt skip # noqa: F401, E402 | |
if that works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer to keep it as is as I'll be looking for the suffix when removing these imports
...utter-template-{{cookiecutter.modelname}}/to_replace_{{cookiecutter.lowercase_modelname}}.py
Outdated
Show resolved
Hide resolved
...utter-template-{{cookiecutter.modelname}}/to_replace_{{cookiecutter.lowercase_modelname}}.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
a75bb8e
to
2e8ef4b
Compare
…ce#29112) * [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
No description provided.