Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename .huggingface/ folder to .cache/huggingface/ #2262

Merged
merged 3 commits into from
May 2, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/en/guides/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,11 +134,11 @@ However, if you need to download files to a specific folder, you can pass a `loc

A `./huggingface/` folder is created at the root of your local directory containing metadata about the downloaded files. This prevents re-downloading files if they're already up-to-date. If the metadata has changed, then the new file version is downloaded. This makes the `local_dir` optimized for pulling only the latest changes.

After completing the download, you can safely remove the `.huggingface/` folder if you no longer need it. However, be aware that re-running your script without this folder may result in longer recovery times, as metadata will be lost. Rest assured that your local data will remain intact and unaffected.
After completing the download, you can safely remove the `.cache/huggingface/` folder if you no longer need it. However, be aware that re-running your script without this folder may result in longer recovery times, as metadata will be lost. Rest assured that your local data will remain intact and unaffected.

<Tip>

Don't worry about the `.huggingface/` folder when committing changes to the Hub! This folder is automatically ignored by both `git` and [`upload_folder`].
Don't worry about the `.cache/huggingface/` folder when committing changes to the Hub! This folder is automatically ignored by both `git` and [`upload_folder`].

</Tip>

Expand Down
23 changes: 12 additions & 11 deletions src/huggingface_hub/_local_folder.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,21 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains utilities to handle the `../.huggingface` folder in local directories.
"""Contains utilities to handle the `../.cache/huggingface` folder in local directories.

First discussed in https://github.com/huggingface/huggingface_hub/issues/1738 to store
download metadata when downloading files from the hub to a local directory (without
using the cache).

./.huggingface folder structure:
./.cache/huggingface folder structure:
[4.0K] data
├── [4.0K] .huggingface
│ └── [4.0K] download
│ ├── [ 16] file.parquet.metadata
│ ├── [ 16] file.txt.metadata
│ └── [4.0K] folder
│ └── [ 16] file.parquet.metadata
├── [4.0K] .cache
│ └── [4.0K] huggingface
│ └── [4.0K] download
│ ├── [ 16] file.parquet.metadata
│ ├── [ 16] file.txt.metadata
│ └── [4.0K] folder
│ └── [ 16] file.parquet.metadata
├── [6.5G] file.parquet
├── [1.5K] file.txt
Expand Down Expand Up @@ -210,12 +211,12 @@ def write_download_metadata(local_dir: Path, filename: str, commit_hash: str, et

@lru_cache()
def _huggingface_dir(local_dir: Path) -> Path:
"""Return the path to the `.huggingface` directory in a local directory."""
"""Return the path to the `.cache/huggingface` directory in a local directory."""
# Wrap in lru_cache to avoid overwriting the .gitignore file if called multiple times
path = local_dir / ".huggingface"
path = local_dir / ".cache" / "huggingface"
path.mkdir(exist_ok=True, parents=True)

# Create a .gitignore file in the .huggingface directory if it doesn't exist
# Create a .gitignore file in the .cache/huggingface directory if it doesn't exist
# Should be thread-safe enough like this.
gitignore = path / ".gitignore"
gitignore_lock = path / ".gitignore.lock"
Expand Down
2 changes: 1 addition & 1 deletion src/huggingface_hub/_snapshot_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def snapshot_download(
`allow_patterns` and `ignore_patterns`.

If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this
option, the `cache_dir` will not be used and a `.huggingface/` folder will be created at the root of `local_dir`
option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir`
to store some metadata related to the downloaded files. While this mechanism is not as robust as the main
cache-system, it's optimized for regularly pulling the latest version of a repository.

Expand Down
2 changes: 1 addition & 1 deletion src/huggingface_hub/file_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -1046,7 +1046,7 @@ def hf_hub_download(
```

If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this
option, the `cache_dir` will not be used and a `.huggingface/` folder will be created at the root of `local_dir`
option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir`
to store some metadata related to the downloaded files. While this mechanism is not as robust as the main
cache-system, it's optimized for regularly pulling the latest version of a repository.

Expand Down
4 changes: 2 additions & 2 deletions src/huggingface_hub/hf_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -5003,7 +5003,7 @@ def hf_hub_download(
```

If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this
option, the `cache_dir` will not be used and a `.huggingface/` folder will be created at the root of `local_dir`
option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir`
to store some metadata related to the downloaded files. While this mechanism is not as robust as the main
cache-system, it's optimized for regularly pulling the latest version of a repository.

Expand Down Expand Up @@ -5122,7 +5122,7 @@ def snapshot_download(
`allow_patterns` and `ignore_patterns`.

If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this
option, the `cache_dir` will not be used and a `.huggingface/` folder will be created at the root of `local_dir`
option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir`
to store some metadata related to the downloaded files.While this mechanism is not as robust as the main
cache-system, it's optimized for regularly pulling the latest version of a repository.

Expand Down
12 changes: 6 additions & 6 deletions src/huggingface_hub/utils/_paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,19 @@

T = TypeVar("T")

# Always ignore `.git` and `.huggingface` folders in commits
# Always ignore `.git` and `.cache/huggingface` folders in commits
DEFAULT_IGNORE_PATTERNS = [
".git",
".git/*",
"*/.git",
"**/.git/**",
".huggingface",
".huggingface/*",
"*/.huggingface",
"**/.huggingface/**",
".cache/huggingface",
".cache/huggingface/*",
"*/.cache/huggingface",
"**/.cache/huggingface/**",
]
# Forbidden to commit these folders
FORBIDDEN_FOLDERS = [".git", ".huggingface"]
FORBIDDEN_FOLDERS = [".git", ".cache"]


def filter_repo_objects(
Expand Down
13 changes: 9 additions & 4 deletions tests/test_commit_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ def test_path_in_repo_invalid(self) -> None:


class TestCommitOperationForbiddenPathInRepo(unittest.TestCase):
"""Commit operations must throw an error on files in the .git/ or .huggingface/ folders.
"""Commit operations must throw an error on files in the .git/ or .cache/huggingface/ folders.

Server would error anyway so it's best to prevent early.
"""
Expand All @@ -68,9 +68,9 @@ class TestCommitOperationForbiddenPathInRepo(unittest.TestCase):
"./.git/path/to/file",
"subfolder/path/.git/to/file",
"./subfolder/path/.git/to/file",
".huggingface",
"./.huggingface/path/to/file",
"./subfolder/path/.huggingface/to/file",
".cache/huggingface",
"./.cache/huggingface/path/to/file",
"./subfolder/path/.cache/huggingface/to/file",
}

VALID_PATHS_IN_REPO = {
Expand All @@ -79,6 +79,11 @@ class TestCommitOperationForbiddenPathInRepo(unittest.TestCase):
"path/to/something.git",
"path/to/something.git/more",
"path/to/something.huggingface/more",
"huggingface",
".huggingface",
"./.huggingface/path/to/file",
"./subfolder/path/huggingface/to/file",
"./subfolder/path/.huggingface/to/file",
}

def test_cannot_update_file_in_git_folder(self):
Expand Down
4 changes: 2 additions & 2 deletions tests/test_file_download.py
Original file line number Diff line number Diff line change
Expand Up @@ -949,15 +949,15 @@ def test_file_exists_and_overwrites(self):

def test_resume_from_incomplete(self):
# An incomplete file already exists => use it
incomplete_path = self.local_dir / ".huggingface" / "download" / (self.file_name + ".incomplete")
incomplete_path = self.local_dir / ".cache" / "huggingface" / "download" / (self.file_name + ".incomplete")
incomplete_path.parent.mkdir(parents=True, exist_ok=True)
incomplete_path.write_text("XXXX") # Here we put fake data to test the resume
self.api.hf_hub_download(self.repo_id, filename=self.file_name, local_dir=self.local_dir)
self.file_path.read_text() == "XXXXent"

def test_do_not_resume_on_force_download(self):
# An incomplete file already exists but force_download=True
incomplete_path = self.local_dir / ".huggingface" / "download" / (self.file_name + ".incomplete")
incomplete_path = self.local_dir / ".cache" / "huggingface" / "download" / (self.file_name + ".incomplete")
incomplete_path.parent.mkdir(parents=True, exist_ok=True)
incomplete_path.write_text("XXXX")
self.api.hf_hub_download(self.repo_id, filename=self.file_name, local_dir=self.local_dir, force_download=True)
Expand Down
4 changes: 2 additions & 2 deletions tests/test_hf_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -530,10 +530,10 @@ def _create_file(*parts) -> None:
path.write_text("content")

_create_file(".git", "file.txt")
_create_file(".huggingface", "file.txt")
_create_file(".cache", "huggingface", "file.txt")
_create_file(".git", "folder", "file.txt")
_create_file("folder", ".git", "file.txt")
_create_file("folder", ".huggingface", "file.txt")
_create_file("folder", ".cache", "huggingface", "file.txt")
_create_file("folder", ".git", "folder", "file.txt")
_create_file(".git_something", "file.txt")
_create_file("file.git")
Expand Down
16 changes: 8 additions & 8 deletions tests/test_local_folder.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains tests for the `.huggingface` folder in local directories.
"""Contains tests for the `.cache/huggingface` folder in local directories.

See `huggingface_hub/src/_local_folder.py` for the implementation.
"""
Expand All @@ -34,10 +34,10 @@


def test_creates_huggingface_dir_with_gitignore(tmp_path: Path):
"""Test `.huggingface/` dir is ignored by git."""
"""Test `.cache/huggingface/` dir is ignored by git."""
local_dir = tmp_path / "path" / "to" / "local"
huggingface_dir = _huggingface_dir(local_dir)
assert huggingface_dir == local_dir / ".huggingface"
assert huggingface_dir == local_dir / ".cache" / "huggingface"
assert huggingface_dir.exists() # all subdirectories have been created
assert huggingface_dir.is_dir()

Expand All @@ -53,8 +53,8 @@ def test_local_download_paths(tmp_path: Path):
# Correct paths (also sanitized on windows)
assert isinstance(paths, LocalDownloadFilePaths)
assert paths.file_path == tmp_path / "path" / "in" / "repo.txt"
assert paths.metadata_path == tmp_path / ".huggingface" / "download" / "path" / "in" / "repo.txt.metadata"
assert paths.lock_path == tmp_path / ".huggingface" / "download" / "path" / "in" / "repo.txt.lock"
assert paths.metadata_path == tmp_path / "cache" / "huggingface" / "download" / "path" / "in" / "repo.txt.metadata"
assert paths.lock_path == tmp_path / "cache" / "huggingface" / "download" / "path" / "in" / "repo.txt.lock"

# Paths are usable (parent directories have been created)
assert paths.file_path.parent.is_dir()
Expand All @@ -64,7 +64,7 @@ def test_local_download_paths(tmp_path: Path):
# Incomplete path are etag-based
assert (
paths.incomplete_path("etag123")
== tmp_path / ".huggingface" / "download" / "path" / "in" / "repo.txt.etag123.incomplete"
== tmp_path / "cache" / "huggingface" / "download" / "path" / "in" / "repo.txt.etag123.incomplete"
)
assert paths.incomplete_path("etag123").parent.is_dir()

Expand All @@ -83,7 +83,7 @@ def test_write_download_metadata(tmp_path: Path):
"""Test download metadata content is valid."""
# Write metadata
write_download_metadata(tmp_path, filename="file.txt", commit_hash="commit_hash", etag="123456789")
metadata_path = tmp_path / ".huggingface" / "download" / "file.txt.metadata"
metadata_path = tmp_path / "cache" / "huggingface" / "download" / "file.txt.metadata"
assert metadata_path.exists()

# Metadata is valid
Expand Down Expand Up @@ -129,7 +129,7 @@ def test_read_download_metadata_no_metadata(tmp_path: Path):
def test_read_download_metadata_corrupted_metadata(tmp_path: Path, caplog: pytest.LogCaptureFixture):
"""Test reading download metadata when metadata is corrupted."""
# Write corrupted metadata
metadata_path = tmp_path / ".huggingface" / "download" / "file.txt.metadata"
metadata_path = tmp_path / "cache" / "huggingface" / "download" / "file.txt.metadata"
metadata_path.parent.mkdir(parents=True, exist_ok=True)
metadata_path.write_text("invalid content")

Expand Down
14 changes: 7 additions & 7 deletions tests/test_utils_paths.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,11 +111,11 @@ class TestDefaultIgnorePatterns(unittest.TestCase):
"path/to/folder/.git",
"path/to/folder/.git/file.txt",
"path/to/.git/folder/file.txt",
".huggingface",
".huggingface/file.txt",
".huggingface/folder/file.txt",
"path/to/.huggingface",
"path/to/.huggingface/file.txt",
".cache/huggingface",
".cache/huggingface/file.txt",
".cache/huggingface/folder/file.txt",
"path/to/.cache/huggingface",
"path/to/.cache/huggingface/file.txt",
]

VALID_PATHS = [
Expand All @@ -125,8 +125,8 @@ class TestDefaultIgnorePatterns(unittest.TestCase):
"path/to/file.git",
"file.huggingface",
"path/file.huggingface",
".huggingface_folder",
".huggingface_folder/file.txt",
".cache/huggingface_folder",
".cache/huggingface_folder/file.txt",
]

def test_exclude_git_folder(self):
Expand Down
Loading