-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ModelHubMixin: save config only if doesn't exist #2105
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
if config is None: | ||
config = self.config | ||
if config is not None: | ||
if is_dataclass(config): | ||
config = asdict(config) # type: ignore[arg-type] | ||
(save_directory / CONFIG_NAME).write_text(json.dumps(config, sort_keys=True, indent=2)) | ||
config_path = save_directory / CONFIG_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not work if the user wants to save a file with a name other than the value of the constant CONFIG_NAME
, e.g. tokenizer.json
. A library could potentially use a different filename by default, and implement a filename
kwarg passed in push_to_hub_kwargs
.
In such case, this can be bypassed by reassigning CONFIG_NAME
in save_pretrained
, but there might be a better way to handle this to make sure it doesn't interfere with anything else.
Maybe ModelHubMixin
could implement a protected _config_filename
attribute whose value would default to CONFIG_NAME
, that libraries could freely reassign?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the concern and suggestion @Natooz!
Actually I would prefer not to allow different config files, at least until we witness a real case in which it would be beneficial. config.json
is a special filename on the Hub as it is automatically used to count model downloads. If the library author wants 2 configs files -1 handled by the library, 1 handled by huggingface_hub-, then it'd be best if the library-specific config file is named differently (keras_config.json
, tokenizer_config.json
, etc.).
Thank you for the prompt adjustments! :) |
Note: failing test is related to Keras update. CI should be fixed in #2107. |
Fix a recent breaking change in
ModelHubMixing
that overwrites the config.json file generated by_save_pretrained
if the class implements it. With this update,config.json
is saved only if it doesn't exist already.cc @Natooz, this is a fix for #2102. Thanks again for reporting :)