Skip to content

v0.10.0: Modelcards, cache management and more

Compare
Choose a tag to compare
@Wauplin Wauplin released this 28 Sep 07:26
· 896 commits to main since this release

Modelcards

Contribution from @nateraw to integrate the work done on Modelcards and DatasetCards (from nateraw/modelcards) directly in huggingface_hub.

>>> from huggingface_hub import ModelCard

>>> card = ModelCard.load('nateraw/vit-base-beans')
>>> card.data.to_dict()
{'language': 'en', 'license': 'apache-2.0', 'tags': ['generated_from_trainer', 'image-classification'],...}

Related commits

Related documentation

Cache management (huggingface-cli scan-cache and huggingface-cli delete-cache)

New commands in huggingface-cli to scan and delete parts of the cache. Goal is to manage the cache-system the same way for any dependent library that uses huggingface_hub. Only the new cache-system format is supported.

➜ huggingface-cli scan-cache
REPO ID                     REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS                LOCAL PATH
--------------------------- --------- ------------ -------- ------------- ------------- ------------------- -------------------------------------------------------------------------
glue                        dataset         116.3K       15 4 days ago    4 days ago    2.4.0, main, 1.17.0 /home/wauplin/.cache/huggingface/hub/datasets--glue
google/fleurs               dataset          64.9M        6 1 week ago    1 week ago    refs/pr/1, main     /home/wauplin/.cache/
(...)

Done in 0.0s. Scanned 6 repo(s) for a total of 3.4G.
Got 1 warning(s) while scanning. Use -vvv to print details.

Related commits

Related documentation

Better error handling (and http-related stuff)

HTTP calls to the Hub have been harmonized to behave the same across the library.

Major differences are:

  • Unified way to handle HTTP errors using hf_raise_for_status (more informative error message)
  • Auth token is always sent by default when a user is logged in (see documentation).
  • package versions are sent as user-agent header for telemetry (python, huggingface_hub, tensorflow, torch,...). It was already the case for hf_hub_download.

Related commits

  • Always send the cached token when user is logged in by @Wauplin in #1064
  • Add user agent to all requests with huggingface_hub version (and other) by @Wauplin in #1075
  • [Repository] Add better error message by @patrickvonplaten in #993
  • Clearer HTTP error messages in huggingface_hub by @Wauplin in #1019
  • Handle backoff on HTTP 503 error when pushing repeatedly by @Wauplin in #1038

Breaking changes

  1. For consistency, the return type of create_commit has been modified. This is a breaking change, but we hope the return type of this method was never used (quite recent and niche output type).
  • Return more information in create_commit output by @Wauplin in #1066
  1. Since repo_id is now validated using @validate_hf_hub_args (see below), a breaking change can be caused if repo_id was previously miused. A HFValidationError is now raised if repo_id is not valid.

Miscellaneous improvements

Add support for autocomplete

http-based push_to_hub_fastai

  • Add changes for push_to_hub_fastai to use the new http-based approach. by @nandwalritik in #1040

Check if a file is cached

  • try_to_load_from_cache returns cached non-existence by @sgugger in #1039

Get file metadata (commit hash, etag, location) without downloading

  • Add get_hf_file_metadata to fetch metadata from the Hub by @Wauplin in #1058

Validate arguments using @validate_hf_hub_args

  • Add validator for repo id + decorator to validate arguments in huggingface_hub by @Wauplin in #1029
  • Remove repo_id validation in hf_hub_url and hf_hub_download by @Wauplin in #1031

⚠️ This is a breaking change if repo_id was previously misused ⚠️

Related documentation:

Documentation updates

Deprecations

  • ENH Deprecate clone_from behavior by @merveenoyan in #952
  • 🗑 Deprecate token in read-only methods of HfApi in favor of use_auth_token by @SBrandeis in #928
  • Remove legacy helper 'install_lfs_in_userspace' by @Wauplin in #1059
  • 1055 deprecate private and repo type in repository class by @Wauplin in #1057

Bugfixes & small improvements

  • Consider empty subfolder as None in hf_hub_url and hf_hub_download by @Wauplin in #1021
  • enable http request retry under proxy by @MrZhengXin in #1022
  • Add securityStatus to ModelInfo object with default value None. by @Wauplin in #1026
  • 👽️ Add size parameter for lfsFiles when committing on the hub by @coyotte508 in #1048
  • Use /models/ path for api call to update settings by @Wauplin in #1049
  • Globally set git credential.helper to store in google colab by @Wauplin in #1053
  • FIX notebook login by @Wauplin in #1073

Windows-specific bug fixes

  • Fix default cache on windows by @thomwolf in #1069
  • Degraded but fully working cache-system when symlinks are not supported by @Wauplin in #1067
  • Check symlinks support per directory instead of globally by @Wauplin in #1077