Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New git-aware cache file layout #801

Merged
merged 47 commits into from
May 25, 2022
Merged
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
6fa66c4
light typing
julien-c Mar 23, 2022
29a8833
remove this seminal comment
julien-c Mar 25, 2022
da92022
I don't understand why we don't early return here
julien-c Mar 25, 2022
e4d3686
following last commit, unnest this
julien-c Mar 25, 2022
8e4ccf4
[BIG] This should work for all repo_types not just models!
julien-c Mar 25, 2022
5ab8b74
one more
julien-c Mar 25, 2022
b8376f8
forgot a repo_type and reorder code
julien-c Mar 25, 2022
4cb1d63
also rename this cache folder
julien-c Mar 25, 2022
f7cbe18
Use `hf_hub_download`, will be simpler later
julien-c Mar 25, 2022
ea94d43
in this new version, `force_filename` does not make sense anymore
julien-c Mar 25, 2022
7b92719
Just inline everything inside `hf_hub_download` for now
julien-c Mar 25, 2022
d29c90e
Big prototype! it works! :tada:
julien-c Mar 25, 2022
02480fd
wip wip
julien-c May 4, 2022
f81885d
do not touch `cached_download`
julien-c May 4, 2022
7fbd7a4
Prompt user to upgrade to `hf_hub_download`
julien-c May 4, 2022
d1e47a1
Add a `legacy_cache_layout=True` to preserve old behavior, just in case
julien-c May 4, 2022
fc930b6
Create `relative symlinks` + add some doc
julien-c May 5, 2022
f5420ed
Fix behavior when no network
julien-c May 5, 2022
9f1e0f6
This test now is legacy
julien-c May 5, 2022
973913e
Fix-ish conflict-ish
julien-c May 5, 2022
e9fb4d4
minimize diff
julien-c May 5, 2022
adaca46
refactor `repo_folder_name`
julien-c May 5, 2022
fff7e9f
windows support + shortcut if user passes a commit hash
julien-c May 5, 2022
06c0ca1
Rewrite `snapshot_download` and make it more robust
julien-c May 5, 2022
e3e2485
OOops
julien-c May 5, 2022
7051683
Create example-transformers-tf.py
julien-c May 5, 2022
84ff68d
Fix + add a way more complete example (running on Ubuntu)
julien-c May 5, 2022
11974d4
Apply suggestions from code review
julien-c May 6, 2022
73cdb46
Update src/huggingface_hub/file_download.py
julien-c May 9, 2022
9f8b876
Update src/huggingface_hub/file_download.py
julien-c May 9, 2022
2c5c94b
Only allow full revision hashes otherwise the `revision != commit_has…
julien-c May 9, 2022
35fff44
add a little bit more doc + consistency
julien-c May 9, 2022
383022f
Update src/huggingface_hub/snapshot_download.py
LysandreJik May 13, 2022
7112146
Update snapshot download
LysandreJik May 18, 2022
8f99d22
First pass on tests
LysandreJik May 18, 2022
01e3a61
Wrap up tests
LysandreJik May 20, 2022
2d54dd1
:wolf: Fix for bug reported by @thomwolf
julien-c May 24, 2022
8f71ad6
Special case for Windows
LysandreJik May 24, 2022
442b5b1
Address comments and docs
LysandreJik May 24, 2022
e1ff4eb
Clean up with ternary cc @julien-c
LysandreJik May 25, 2022
e69ef4f
Add argument to `cached_download`
LysandreJik May 25, 2022
0d06b55
Opt-in for filename_to-url
LysandreJik May 25, 2022
d6bb92e
Opt-in for filename_to-url
LysandreJik May 25, 2022
4259642
Pass the flag
LysandreJik May 25, 2022
a8348ab
Update docs/source/package_reference/file_download.mdx
LysandreJik May 25, 2022
8c599ee
Update src/huggingface_hub/file_download.py
LysandreJik May 25, 2022
3dfc00c
Address review comments
LysandreJik May 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 8 additions & 14 deletions src/huggingface_hub/_snapshot_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,26 +158,20 @@ def snapshot_download(
# If the specified revision is a commit hash, look inside "snapshots".
# If the specified revision is a branch or tag, look inside "refs".
if local_files_only:
commit_hash = revision
if not REGEX_COMMIT_HASH.match(commit_hash):
# rertieve commit_hash from file

def resolve_ref(revision) -> str:
# retrieve commit_hash from file
ref_path = os.path.join(storage_folder, "refs", revision)
with open(ref_path) as f:
commit_hash = f.read()
return f.read()

commit_hash = (
revision if REGEX_COMMIT_HASH.match(revision) else resolve_ref(revision)
)
snapshot_folder = os.path.join(storage_folder, "snapshots", commit_hash)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See side by side comparison of before/after @julien-c:

After Before
if local_files_only:

    def resolve_ref(revision) -> str:
        # retrieve commit_hash from file
        ref_path = os.path.join(storage_folder, "refs", revision)
        with open(ref_path) as f:
            return f.read()

    commit_hash = (
        revision if REGEX_COMMIT_HASH.match(revision) else resolve_ref(revision)
    )
    snapshot_folder = os.path.join(storage_folder, "snapshots", commit_hash)

    if os.path.exists(snapshot_folder):
        return snapshot_folder

    raise ValueError(
        "Cannot find an appropriate cached snapshot folder for the specified"
        " revision on the local disk and outgoing traffic has been disabled. To"
        " enable repo look-ups and downloads online, set 'local_files_only' to"
        " False."
    )
if local_files_only:
    if REGEX_COMMIT_HASH.match(revision):
        snapshot_folder = os.path.join(storage_folder, "snapshots", revision)
        if os.path.exists(snapshot_folder):
            return snapshot_folder
    else:
        ref_path = os.path.join(storage_folder, "refs", revision)
        with open(ref_path) as f:
            commit_hash = f.read()
        snapshot_folder = os.path.join(storage_folder, "snapshots", commit_hash)
        if os.path.exists(snapshot_folder):
            return snapshot_folder
     raise ValueError(
        "Cannot find an appropriate cached folder for the specified revision on the"
        " local disk and outgoing traffic has been disabled. To enable repo"
        " look-ups and downloads online, set 'local_files_only' to False."
    )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks great to me!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same logic exists in hf_hub_download no? (not 100% sure anymore)

if os.path.exists(snapshot_folder):
return snapshot_folder
snapshot_folder = os.path.join(storage_folder, "snapshots", revision)
if os.path.exists(snapshot_folder):
return snapshot_folder
else:
ref_path = os.path.join(storage_folder, "refs", revision)
with open(ref_path) as f:
commit_hash = f.read()
snapshot_folder = os.path.join(storage_folder, "snapshots", commit_hash)
if os.path.exists(snapshot_folder):
return snapshot_folder

raise ValueError(
"Cannot find an appropriate cached snapshot folder for the specified"
Expand Down