Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address failing _check_disk_space() when path doesn't exist yet #1692

Merged

Conversation

martinbrose
Copy link
Contributor

@martinbrose martinbrose commented Sep 23, 2023

Address issue #1690.

_check_disk_space fails when only provided with incomplete paths like checkpoints/stabilityai/stable-diffusion-xl-base-1.0 and then tries to asses exactly this "path".
It seems to be a regular case when snapshot_download and hence hf_hub_download is being called without the parameter local_dir.

snapshot_download.py (248:250) only performs os.path.realpath() when local_dir is not None:

    if local_dir is not None:
        return str(os.path.realpath(local_dir))
    return snapshot_folder

My change in this PR now addresses this case and ensures that the real complete path is being used in shutil.disk_usage().free within _check_disk_space().

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 23, 2023

The documentation is not available anymore as the PR was closed or merged.

@martinbrose martinbrose changed the title Add realpath to disk_usage to allow path-like obj Add realpath to disk_usage to allow incomplete path Sep 24, 2023
@Wauplin
Copy link
Contributor

Wauplin commented Sep 25, 2023

Hi @martinbrose thanks for looking into this! I think the problem is not with the confusion between absolute or relative paths. In theory shutil.disk_usage() is supposed to handle that correctly for us (I tried locally and shutil.disk_usage("path/to/folder") has the same output as shutil.disk_usage(os.path.realpath("path/to/folder")) for me). It seems that the issue is more coming from the fact that the folder does not exist yet.

Here is what I think we should do:

  • If target_dir does not exist, we check its parent directories recursively. If no parent folder exists, let's not raise a warning or exception. Not sure if this is possible but in any case the core "download logic" will take care of either creating the path or raise a proper exception
  • Make sure that _check_disk_space never fails, no matter what. This method is supposed to help users but we don't want it to be a blocking in case something happens.

This would give an implementation looking like this:

    target_dir = Path(target_dir)                         # format as `Path`
    for path in [target_dir] + list(target_dir.parents):  # first check target_dir, then each parents one by one
        try:
            target_dir_free = shutil.disk_usage(path).free
            if target_dir_free < expected_size:
                warnings.warn(
                    "Not enough free disk space to download the file. "
                    f"The expected file size is: {expected_size / 1e6:.2f} MB. "
                    f"The target location {target_dir} only has {target_dir_free / 1e6:.2f} MB free disk space."
                )
            return
        except OSError:                                   # raise on anything: file does not exist or space disk cannot be checked
            pass                                          # never raise. It's best to check parent folder instead

Corresponding tests could be extended (1 for missing folder, 1 for relative path). Do you want to take care of it @martinbrose ? Thanks in advance!

@martinbrose
Copy link
Contributor Author

HI @Wauplin,

thanks for your reply. You seem to be right... just double checked myself.
I was convinced that this is the root problem. Weird.

I'll have a look tonight and update the PR tonight.

@martinbrose
Copy link
Contributor Author

martinbrose commented Sep 25, 2023

Hi @Wauplin,

your code recommendation makes sense... implemented and tested it.

I also added the two additional test cases. While I'm kind of confident with the test case for the non-existent path, I'm not sure I understand exactly the ask for a test case for relative paths. Maybe I addressed it, maybe not. Please let me know.

@martinbrose martinbrose changed the title Add realpath to disk_usage to allow incomplete path Address failing _check_disk_space when path doesn't exist yet Sep 25, 2023
@martinbrose martinbrose changed the title Address failing _check_disk_space when path doesn't exist yet Address failing _check_disk_space() when path doesn't exist yet Sep 25, 2023
Comment on lines 115 to 128
def test_disk_usage_warning_with_non_existent_path(self) -> None:
# Test for not existent path
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
_check_disk_space(expected_size=self.expected_size, target_dir="not_existent_path")
assert len(w) == 0

# Test for relative path
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
_check_disk_space(expected_size=self.expected_size, target_dir="./not_existent_path")
assert len(w) == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly how I was seeing it except that 1 path must be relative (let's say "path/to/not_existent_path") and the other one absolute ("/path/to/not_existent_path") to check that both of them doesn't raise an exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_disk_usage_warning_with_non_existent_path(self) -> None:
# Test for not existent path
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
_check_disk_space(expected_size=self.expected_size, target_dir="not_existent_path")
assert len(w) == 0
# Test for relative path
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
_check_disk_space(expected_size=self.expected_size, target_dir="./not_existent_path")
assert len(w) == 0
def test_disk_usage_warning_with_non_existent_path(self) -> None:
# Test for not existent (absolute) path
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
_check_disk_space(expected_size=self.expected_size, target_dir="path/to/not_existent_path")
assert len(w) == 0
# Test for not existent (relative) path
with warnings.catch_warnings(record=True) as w:
# Cause all warnings to always be triggered.
warnings.simplefilter("always")
_check_disk_space(expected_size=self.expected_size, target_dir="/path/to/not_existent_path")
assert len(w) == 0

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes @martinbrose! Test looks good. I've updated it to have both an absolute and a relative paths but that's it. We are good to merge now :)

@codecov
Copy link

codecov bot commented Sep 26, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (b171e02) 82.41% compared to head (6513981) 81.96%.
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1692      +/-   ##
==========================================
- Coverage   82.41%   81.96%   -0.45%     
==========================================
  Files          62       62              
  Lines        7194     7198       +4     
==========================================
- Hits         5929     5900      -29     
- Misses       1265     1298      +33     
Files Coverage Δ
src/huggingface_hub/file_download.py 85.36% <100.00%> (+0.11%) ⬆️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wauplin Wauplin merged commit 89cc691 into huggingface:main Sep 26, 2023
Wauplin added a commit that referenced this pull request Sep 26, 2023
* Add realpath to disk_usage to allow path-like obj

* Add logic to address non-existent paths

* Add test cased for non-existent paths

* Update tests/test_file_download.py

---------

Co-authored-by: Lucain <lucainp@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants