Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(huggingface): wrap HfFileSystem to an async filesystem #957

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

skshetry
Copy link
Member

This commit wraps the HfFileSystem class to convert to an async filesystem. This is done by using the AsyncFileSystemWrapper class from the fsspec library, which requires fsspec>=2024.12.0.

With the wrapper, all the functions will run in a separate thread in an executor, and won't block the main event-loop.

This will now support all the async APIs that the Client expects and, hence, will support file caching.

The alternative here is to have two different implementations, one async and another sync, but that is too much of an effort at this time for huggingface.

Fixes #661.

This commit wraps the HfFileSystem class to convert to an async filesystem.
This is done by using the `AsyncFileSystemWrapper` class from the `fsspec` library, which requires `fsspec>=2024.12.0`.

With the wrapper, all the functions will run in a separate thread in an
executor, and won't block the main event-loop.

This will now support all the async APIs that the `Client` expects, and
will hence support caching of the files.

The alternative here is to have two different implementations, one async
and another sync, but that is too much of an effort at this time.

Fixes #661.
@skshetry skshetry requested a review from shcheklein March 10, 2025 10:45
Copy link

cloudflare-workers-and-pages bot commented Mar 10, 2025

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0c98175
Status: ✅  Deploy successful!
Preview URL: https://09d709ff.datachain-documentation.pages.dev
Branch Preview URL: https://cache-hfs.datachain-documentation.pages.dev

View logs

@skshetry skshetry marked this pull request as draft March 10, 2025 10:51
Copy link

codecov bot commented Mar 10, 2025

Codecov Report

Attention: Patch coverage is 70.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 87.65%. Comparing base (1795389) to head (0c98175).

Files with missing lines Patch % Lines
src/datachain/client/hf.py 77.77% 2 Missing ⚠️
src/datachain/client/fsspec.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #957      +/-   ##
==========================================
- Coverage   87.66%   87.65%   -0.01%     
==========================================
  Files         131      131              
  Lines       11903    11904       +1     
  Branches     1623     1622       -1     
==========================================
  Hits        10435    10435              
- Misses       1058     1059       +1     
  Partials      410      410              
Flag Coverage Δ
datachain 87.58% <70.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@skshetry skshetry removed the request for review from shcheklein March 10, 2025 11:02
@skshetry skshetry marked this pull request as ready for review March 10, 2025 14:40
@skshetry skshetry requested review from shcheklein and a team March 10, 2025 14:40
@skshetry
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache for HF file system is broken
1 participant