v0.10.0: Modelcards, cache management and more
Modelcards
Contribution from @nateraw to integrate the work done on Modelcards and DatasetCards (from nateraw/modelcards) directly in huggingface_hub
.
>>> from huggingface_hub import ModelCard
>>> card = ModelCard.load('nateraw/vit-base-beans')
>>> card.data.to_dict()
{'language': 'en', 'license': 'apache-2.0', 'tags': ['generated_from_trainer', 'image-classification'],...}
Related commits
- Add additional repo card utils from
modelcards
repo by @nateraw in #940 - Add regression test for empty modelcard update by @Wauplin in #1060
- Add template variables to dataset card template by @nateraw in #1068
- Further clarifying Model Card sections by @meg-huggingface in #1052
- Create modelcard if doesn't exist on
update_metadata
by @Wauplin in #1061
Related documentation
Cache management (huggingface-cli scan-cache
and huggingface-cli delete-cache
)
New commands in huggingface-cli
to scan and delete parts of the cache. Goal is to manage the cache-system the same way for any dependent library that uses huggingface_hub
. Only the new cache-system format is supported.
➜ huggingface-cli scan-cache
REPO ID REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
--------------------------- --------- ------------ -------- ------------- ------------- ------------------- -------------------------------------------------------------------------
glue dataset 116.3K 15 4 days ago 4 days ago 2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue
google/fleurs dataset 64.9M 6 1 week ago 1 week ago refs/pr/1, main /home/wauplin/.cache/
(...)
Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
Got 1 warning(s) while scanning. Use -vvv to print details.
Related commits
- Feature: add an utility to scan cache by @Wauplin in #990
- Utility to delete revisions by @Wauplin in #1035
- 1025 add time details to scan cache by @Wauplin in #1045
- Fix scan cache failure when cached snapshot is empty by @Wauplin in #1054
- 1025
huggingface-cli delete-cache
command by @Wauplin in #1046 - Sort repos/revisions by age in
delete-cache
by @Wauplin in #1063
Related documentation
Better error handling (and http-related stuff)
HTTP calls to the Hub have been harmonized to behave the same across the library.
Major differences are:
- Unified way to handle HTTP errors using
hf_raise_for_status
(more informative error message) - Auth token is always sent by default when a user is logged in (see documentation).
- package versions are sent as user-agent header for telemetry (python, huggingface_hub, tensorflow, torch,...). It was already the case for
hf_hub_download
.
Related commits
- Always send the cached token when user is logged in by @Wauplin in #1064
- Add user agent to all requests with huggingface_hub version (and other) by @Wauplin in #1075
- [Repository] Add better error message by @patrickvonplaten in #993
- Clearer HTTP error messages in
huggingface_hub
by @Wauplin in #1019 - Handle backoff on HTTP 503 error when pushing repeatedly by @Wauplin in #1038
Breaking changes
- For consistency, the return type of
create_commit
has been modified. This is a breaking change, but we hope the return type of this method was never used (quite recent and niche output type).
- Since
repo_id
is now validated using@validate_hf_hub_args
(see below), a breaking change can be caused ifrepo_id
was previously miused. AHFValidationError
is now raised ifrepo_id
is not valid.
Miscellaneous improvements
Add support for autocomplete
http-based push_to_hub_fastai
- Add changes for push_to_hub_fastai to use the new http-based approach. by @nandwalritik in #1040
Check if a file is cached
Get file metadata (commit hash, etag, location) without downloading
Validate arguments using @validate_hf_hub_args
- Add validator for repo id + decorator to validate arguments in
huggingface_hub
by @Wauplin in #1029 - Remove repo_id validation in hf_hub_url and hf_hub_download by @Wauplin in #1031
repo_id
was previously misused
Related documentation:
Documentation updates
- Fix raise syntax: remove markdown bullet point by @mishig25 in #1034
- docs render tree correctly by @mishig25 in #1070
Deprecations
- ENH Deprecate clone_from behavior by @merveenoyan in #952
- 🗑 Deprecate
token
in read-only methods ofHfApi
in favor ofuse_auth_token
by @SBrandeis in #928 - Remove legacy helper 'install_lfs_in_userspace' by @Wauplin in #1059
- 1055 deprecate private and repo type in repository class by @Wauplin in #1057
Bugfixes & small improvements
- Consider empty subfolder as None in hf_hub_url and hf_hub_download by @Wauplin in #1021
- enable http request retry under proxy by @MrZhengXin in #1022
- Add securityStatus to ModelInfo object with default value None. by @Wauplin in #1026
- 👽️ Add size parameter for lfsFiles when committing on the hub by @coyotte508 in #1048
- Use
/models/
path for api call to update settings by @Wauplin in #1049 - Globally set git credential.helper to
store
in google colab by @Wauplin in #1053 - FIX notebook login by @Wauplin in #1073