Explicit arguments in `from_pretrained` #24306

ydshieh · 2023-06-15T14:48:57Z

What does this PR do?

[still incomplete]

Need to apply the same changes to other files containing from_pretrained (other framework, other classes like config, processor, auto, etc.) but @sgugger let me know if I am not lost already in the early stage.

HuggingFaceDocBuilderDev · 2023-06-15T15:10:08Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for owrking on this. I think ignore_mismatched_sizes and use_safetensors might be worth exposing as well. We should also rework the docstring to group the arguments exposed first and the the others second (with titles like main arguments and power user arguments).

src/transformers/modeling_utils.py

src/transformers/configuration_utils.py

ydshieh · 2023-06-16T13:28:17Z

TODO:

for TF/Flax model from_pretrained
for tokenizer/processors
for auto

src/transformers/models/clap/configuration_clap.py

src/transformers/configuration_utils.py

src/transformers/generation/configuration_utils.py

src/transformers/modeling_utils.py

ydshieh · 2023-06-16T14:57:47Z

@sgugger Would be nice if you can take a quick look 🙏 . And do you want me to deal with all framework (TF/Flax), tokenizer/processor, and also auto in this PR, or I am allowed to separate them ..?

sgugger

I think you need to add tests of the new token argument as it's not passed along properly (unless I missed something).

src/transformers/configuration_utils.py

src/transformers/models/clap/configuration_clap.py

src/transformers/models/clip/configuration_clip.py

src/transformers/models/owlvit/configuration_owlvit.py

sgugger · 2023-06-21T15:17:52Z

You have a lot of tests failing to fix 😅 , sure you want a review yet?

ydshieh · 2023-06-21T16:23:24Z

@sgugger No, I didn't request a new review since last time you have a look. But the changes pushed triggered you 😆

ydshieh · 2023-06-21T17:08:08Z

src/transformers/configuration_utils.py

+    def _set_token_in_kwargs(self, kwargs, token=None):
+        """Temporary method to deal with `token` and `use_auth_token`.
+
+        This method is to avoid apply the same changes in all model config classes that overwrite `from_pretrained`.
+
+        Need to clean up `use_auth_token` in a follow PR.
+        """
+        # Some model config classes like CLIP define their own `from_pretrained` without the new argument `token` yet.
+        if token is None:
+            token = kwargs.pop("token", None)
+        use_auth_token = kwargs.pop("use_auth_token", None)
+
+        if use_auth_token is not None:
+            warnings.warn(
+                "The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning
+            )
+            if token is not None:
+                raise ValueError(
+                    "`token` and `use_auth_token` are both specified. Please set only the argument `token`."
+                )
+            token = use_auth_token
+
+        if token is not None:
+            # change to `token` in a follow-up PR
+            kwargs["use_auth_token"] = token


We have quite some model classes (e.g. clip-like family) whose config classes have their own from_pretrained.

This private _set_token_in_kwargs method is to make life easier when dealing with token and use_auth_token.

We will see what the best way is in a follow up PR when we want to make those customized from_pretrained with explicit arguments.

@sgugger Now it's the good time if you are motivated after 🦷.

Ugh... Annoying! This fix works in the meantime.

sgugger

Thanks a lot!

sgugger · 2023-06-21T17:20:28Z

src/transformers/configuration_utils.py

+    def _set_token_in_kwargs(self, kwargs, token=None):
+        """Temporary method to deal with `token` and `use_auth_token`.
+
+        This method is to avoid apply the same changes in all model config classes that overwrite `from_pretrained`.
+
+        Need to clean up `use_auth_token` in a follow PR.
+        """
+        # Some model config classes like CLIP define their own `from_pretrained` without the new argument `token` yet.
+        if token is None:
+            token = kwargs.pop("token", None)
+        use_auth_token = kwargs.pop("use_auth_token", None)
+
+        if use_auth_token is not None:
+            warnings.warn(
+                "The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning
+            )
+            if token is not None:
+                raise ValueError(
+                    "`token` and `use_auth_token` are both specified. Please set only the argument `token`."
+                )
+            token = use_auth_token
+
+        if token is not None:
+            # change to `token` in a follow-up PR
+            kwargs["use_auth_token"] = token


Ugh... Annoying! This fix works in the meantime.

ydshieh requested a review from sgugger June 15, 2023 14:57

sgugger reviewed Jun 15, 2023

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved