Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit arguments in from_pretrained #24306

Merged
merged 19 commits into from
Jun 21, 2023
Merged

Explicit arguments in from_pretrained #24306

merged 19 commits into from
Jun 21, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jun 15, 2023

What does this PR do?

[still incomplete]

Need to apply the same changes to other files containing from_pretrained (other framework, other classes like config, processor, auto, etc.) but @sgugger let me know if I am not lost already in the early stage.

@ydshieh ydshieh requested a review from sgugger June 15, 2023 14:57
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 15, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for owrking on this. I think ignore_mismatched_sizes and use_safetensors might be worth exposing as well. We should also rework the docstring to group the arguments exposed first and the the others second (with titles like main arguments and power user arguments).

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved
src/transformers/modeling_utils.py Outdated Show resolved Hide resolved
@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 16, 2023

TODO:

  • for TF/Flax model from_pretrained
  • for tokenizer/processors
  • for auto

@ydshieh ydshieh requested a review from sgugger June 16, 2023 14:56
@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 16, 2023

@sgugger Would be nice if you can take a quick look 🙏 . And do you want me to deal with all framework (TF/Flax), tokenizer/processor, and also auto in this PR, or I am allowed to separate them ..?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add tests of the new token argument as it's not passed along properly (unless I missed something).

src/transformers/configuration_utils.py Outdated Show resolved Hide resolved
src/transformers/models/clap/configuration_clap.py Outdated Show resolved Hide resolved
src/transformers/models/clip/configuration_clip.py Outdated Show resolved Hide resolved
src/transformers/models/clip/configuration_clip.py Outdated Show resolved Hide resolved
src/transformers/models/owlvit/configuration_owlvit.py Outdated Show resolved Hide resolved
src/transformers/models/owlvit/configuration_owlvit.py Outdated Show resolved Hide resolved
src/transformers/models/owlvit/configuration_owlvit.py Outdated Show resolved Hide resolved
@ydshieh ydshieh changed the title [WIP] explicit arguments in from_pretrained Explicit arguments in from_pretrained Jun 21, 2023
@ydshieh ydshieh marked this pull request as ready for review June 21, 2023 14:40
@sgugger
Copy link
Collaborator

sgugger commented Jun 21, 2023

You have a lot of tests failing to fix 😅 , sure you want a review yet?

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jun 21, 2023

@sgugger No, I didn't request a new review since last time you have a look. But the changes pushed triggered you 😆

Comment on lines +470 to +494
def _set_token_in_kwargs(self, kwargs, token=None):
"""Temporary method to deal with `token` and `use_auth_token`.

This method is to avoid apply the same changes in all model config classes that overwrite `from_pretrained`.

Need to clean up `use_auth_token` in a follow PR.
"""
# Some model config classes like CLIP define their own `from_pretrained` without the new argument `token` yet.
if token is None:
token = kwargs.pop("token", None)
use_auth_token = kwargs.pop("use_auth_token", None)

if use_auth_token is not None:
warnings.warn(
"The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning
)
if token is not None:
raise ValueError(
"`token` and `use_auth_token` are both specified. Please set only the argument `token`."
)
token = use_auth_token

if token is not None:
# change to `token` in a follow-up PR
kwargs["use_auth_token"] = token
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have quite some model classes (e.g. clip-like family) whose config classes have their own from_pretrained.

This private _set_token_in_kwargs method is to make life easier when dealing with token and use_auth_token.

We will see what the best way is in a follow up PR when we want to make those customized from_pretrained with explicit arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger Now it's the good time if you are motivated after 🦷.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... Annoying! This fix works in the meantime.

@ydshieh ydshieh requested a review from sgugger June 21, 2023 17:20
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Comment on lines +470 to +494
def _set_token_in_kwargs(self, kwargs, token=None):
"""Temporary method to deal with `token` and `use_auth_token`.

This method is to avoid apply the same changes in all model config classes that overwrite `from_pretrained`.

Need to clean up `use_auth_token` in a follow PR.
"""
# Some model config classes like CLIP define their own `from_pretrained` without the new argument `token` yet.
if token is None:
token = kwargs.pop("token", None)
use_auth_token = kwargs.pop("use_auth_token", None)

if use_auth_token is not None:
warnings.warn(
"The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning
)
if token is not None:
raise ValueError(
"`token` and `use_auth_token` are both specified. Please set only the argument `token`."
)
token = use_auth_token

if token is not None:
# change to `token` in a follow-up PR
kwargs["use_auth_token"] = token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh... Annoying! This fix works in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants