Remove static pretrained maps from the library's internals #29112

LysandreJik · 2024-02-19T16:05:29Z

No description provided.

LysandreJik · 2024-02-19T16:16:42Z

Before this ungodly PR gets merged, I need to check that every checkpoint referenced here behaves the same once its pretrained map has been removed.

I'll link the PRs open as a result in this comment.

🟣: merged
🟢: open
🔴: closed
🟡: not open yet

Repos with PRs opened

ALBERT

🟣 albert/albert-base-v1#2
🟣 albert/albert-base-v2#6
🟣 albert/albert-large-v1#2
🟣 albert/albert-large-v2#3
🟣 albert/albert-xlarge-v1#2
🟣 albert/albert-xlarge-v2#2
🟣 albert/albert-xxlarge-v1#3
🟣 albert/albert-xxlarge-v2#3

BERT

Non-canonical models

HuggingFaceDocBuilderDev · 2024-02-19T16:24:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker · 2024-02-21T07:18:49Z

...ter-template-{{cookiecutter.modelname}}/tokenization_{{cookiecutter.lowercase_modelname}}.py

@@ -52,8 +43,6 @@ class {{cookiecutter.camelcase_modelname}}Tokenizer(BertTokenizer):

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
-    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
-    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION

 {%- elif cookiecutter.tokenizer_type == "Based on BART" %}
 from ...utils import logging


The line with

PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = { "{{cookiecutter.checkpoint_identifier}}": 1024, }

should also be removed!

Correct! This needs to be adapted as well as the "new model addition" script that relies on these. Also there are 5k tests failing that I need to fix before I un-draft it :)

ArthurZucker · 2024-02-21T07:20:03Z

...emplate-{{cookiecutter.modelname}}/tokenization_fast_{{cookiecutter.lowercase_modelname}}.py

@@ -53,8 +44,6 @@ class {{cookiecutter.camelcase_modelname}}TokenizerFast(BertTokenizerFast):

    vocab_files_names = VOCAB_FILES_NAMES
    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
-    max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
-    pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
    slow_tokenizer_class = {{cookiecutter.camelcase_modelname}}Tokenizer

 {%- elif cookiecutter.tokenizer_type == "Based on BART" %}


let's also cleanup for bart based

and Standalone

Fixes distilgpt2 tokenization. Previously, we only used the fallback configuration if there was no `tokenizer_config.json` in the model repo. These files are now being added to some repos in the context of removing dependencies with transformers' internals, like this PR: huggingface/transformers#29112. But only keys removed from the hardcoded rules are being added to minimize potential breaking changes. We now use the fallback config if tokenizer_config.json exists, no tokenizer class is specified, and we do have a fallback config for this architecture.

julien-c

casual G.O.A.T. = @LysandreJik

LysandreJik · 2024-03-08T10:04:58Z

Updated the base to move some of the refactor to that PR: #29534

julien-c · 2024-03-14T16:16:24Z

why deprecate vs remove?

LysandreJik · 2024-03-14T16:17:00Z

Deprecate for 2 months and then simply DEL a file and it's removed

LysandreJik · 2024-03-15T10:50:37Z

It should be good for review for the brave one @amyeroberts @ArthurZucker.

Failing tests are unrelated and also failing on main.

ArthurZucker

First part

src/transformers/__init__.py

ArthurZucker · 2024-03-22T09:42:21Z

src/transformers/commands/add_new_model_like.py

TODO for me, we need to make sure the the test add some pre-trained models to test!

src/transformers/models/deprecated/_archive_maps.py

src/transformers/models/deprecated/retribert/tokenization_retribert.py

src/transformers/models/deprecated/retribert/tokenization_retribert_fast.py

src/transformers/models/deprecated/tapex/tokenization_tapex.py

ArthurZucker · 2024-03-22T09:54:50Z

src/transformers/models/dpr/modeling_dpr.py

+from ..deprecated._archive_maps import (  # noqa: F401, E402
+    DPR_CONTEXT_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST,  # noqa: F401, E402
+    DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST,  # noqa: F401, E402
+    DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST,  # noqa: F401, E402
+)


Suggested change

from ..deprecated._archive_maps import ( # noqa: F401, E402

DPR_CONTEXT_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402

DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402

DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST, # noqa: F401, E402

)

from ..deprecated._archive_maps import DPR_CONTEXT_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, DPR_QUESTION_ENCODER_PRETRAINED_MODEL_ARCHIVE_LIST, DPR_READER_PRETRAINED_MODEL_ARCHIVE_LIST #fmt skip # noqa: F401, E402

if that works

prefer to keep it as is as I'll be looking for the suffix when removing these imports

src/transformers/models/llama/tokenization_llama_fast.py

...utter-template-{{cookiecutter.modelname}}/to_replace_{{cookiecutter.lowercase_modelname}}.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

…ce#29112) * [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* [test_all] Remove static pretrained maps from the library's internals * Deprecate archive maps instead of removing them * Revert init changes * [test_all] Deprecate instead of removing * [test_all] PVT v2 support * [test_all] Tests should all pass * [test_all] Style * Address review comments * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/deprecated/_archive_maps.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * [test_all] trigger tests * [test_all] LLAVA * [test_all] Bad rebase --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

ArthurZucker reviewed Feb 21, 2024

View reviewed changes

pcuenca mentioned this pull request Feb 29, 2024

Use fallback config if class not defined huggingface/swift-transformers#53

Merged

LysandreJik force-pushed the remove-maps branch from a3e81c1 to c27ba61 Compare March 5, 2024 15:41

julien-c reviewed Mar 5, 2024

View reviewed changes

LysandreJik force-pushed the remove-maps branch 2 times, most recently from 2ce0712 to dbedf2b Compare March 8, 2024 09:59

LysandreJik mentioned this pull request Mar 8, 2024

Adds pretrained IDs directly in the tests #29534

Merged

LysandreJik changed the base branch from main to add_pretrained_ids_in_tests March 8, 2024 10:04

LysandreJik force-pushed the add_pretrained_ids_in_tests branch from e66a3f9 to 9dd29e4 Compare March 12, 2024 16:23

LysandreJik force-pushed the remove-maps branch from dbedf2b to cec1eef Compare March 12, 2024 16:31

Base automatically changed from add_pretrained_ids_in_tests to main March 13, 2024 13:53

LysandreJik force-pushed the remove-maps branch 3 times, most recently from 5f314cf to fbfb0e9 Compare March 14, 2024 16:13

LysandreJik marked this pull request as ready for review March 14, 2024 16:58

LysandreJik force-pushed the remove-maps branch 2 times, most recently from 9e70d94 to 3be48a5 Compare March 15, 2024 08:39

ArthurZucker reviewed Mar 22, 2024

View reviewed changes

ArthurZucker approved these changes Mar 22, 2024

View reviewed changes

LysandreJik added 2 commits March 25, 2024 09:25

[test_all] Remove static pretrained maps from the library's internals

1c70522

Deprecate archive maps instead of removing them

4ee3342

LysandreJik and others added 8 commits March 25, 2024 09:26

Revert init changes

baa78b1

[test_all] Deprecate instead of removing

60c43da

[test_all] PVT v2 support

e42cc77

[test_all] Tests should all pass

e77b507

[test_all] Style

e029dac

Address review comments

fa84a64

Update src/transformers/models/deprecated/_archive_maps.py

afa617a

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Update src/transformers/models/deprecated/_archive_maps.py

2e8ef4b

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

LysandreJik force-pushed the remove-maps branch from a75bb8e to 2e8ef4b Compare March 25, 2024 08:27

LysandreJik added 3 commits March 25, 2024 09:30

[test_all] trigger tests

9ba7971

[test_all] LLAVA

1b17eaf

[test_all] Bad rebase

32d0d07

LysandreJik merged commit 39114c0 into main Mar 25, 2024
21 checks passed

LysandreJik deleted the remove-maps branch March 25, 2024 09:33

Rocketknight1 mentioned this pull request Mar 28, 2024

Add DBRX Model #29921

Merged

5 tasks

ydshieh mentioned this pull request Apr 5, 2024

Fix auto tests #30067

Merged

helpmefindaname mentioned this pull request May 3, 2024

model_max_length default parameters are missing in transformers>=4.40.0 #30643

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove static pretrained maps from the library's internals #29112

Remove static pretrained maps from the library's internals #29112

LysandreJik commented Feb 19, 2024

LysandreJik commented Feb 19, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 19, 2024

ArthurZucker Feb 21, 2024

LysandreJik Feb 21, 2024

ArthurZucker Feb 21, 2024

ArthurZucker Feb 21, 2024

ArthurZucker Feb 21, 2024

julien-c left a comment

LysandreJik commented Mar 8, 2024

julien-c commented Mar 14, 2024

LysandreJik commented Mar 14, 2024 •

edited

Loading

LysandreJik commented Mar 15, 2024

ArthurZucker left a comment

ArthurZucker Mar 22, 2024

ArthurZucker Mar 22, 2024

LysandreJik Mar 22, 2024

Remove static pretrained maps from the library's internals #29112

Remove static pretrained maps from the library's internals #29112

Conversation

LysandreJik commented Feb 19, 2024

LysandreJik commented Feb 19, 2024 • edited Loading

Repos with PRs opened

ALBERT

BERT

CamemBERT

CTRL

DistilBERT

GPT-2

OpenAI GPT

RoBERTa

Non-canonical models

HuggingFaceDocBuilderDev commented Feb 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julien-c left a comment

Choose a reason for hiding this comment

LysandreJik commented Mar 8, 2024

julien-c commented Mar 14, 2024

LysandreJik commented Mar 14, 2024 • edited Loading

LysandreJik commented Mar 15, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik commented Feb 19, 2024 •

edited

Loading

LysandreJik commented Mar 14, 2024 •

edited

Loading