Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HfFileSystemFile when init fails + improve error message #1805

Merged
merged 4 commits into from
Nov 6, 2023

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Nov 6, 2023

Fix #1800 (?)

This PR does 2 things:

  • if HfFileSystemFile.__init__ fails because repo/revision do not exist, self.resolved_path is not set. As a consequence the magic __del__ (called just after since init failed) also fails because the file cannot be closed. This PR fixes this by not closing the file if resolved_path is not set.
  • The FileNotFoundError error message is improved to let the user know why the file is not found (either repo not found, repo_id not valid or revision not found). Also mentions to the user that the repo must exist before writing a file to it. @lhoestq please let me know what you think of the updated message.

I am not a big fan of auto-creating the repo/revision if not found. Biggest problem is that we would need to provide additional information (typically should it be created as private? and if it is a Space repo, which sdk to specify?). I think for now improving the error message is already a good enough solution.

Traceback (most recent call last):
  File "/home/wauplin/projects/huggingface_hub/duc.py", line 7, in <module>
    duckdb.sql("COPY data TO 'hf://datasets/Wauplin/tmp-duckdb-export@branch/data.parquet' (FORMAT PARQUET);")
duckdb.duckdb.Error: Invalid Error: FileNotFoundError: Wauplin/tmp-duckdb-export@branch/data.parquet (repository not found).
Make sure the repository and revision exist before writing data.

At:
  /home/wauplin/projects/huggingface_hub/src/huggingface_hub/hf_file_system.py(422): __init__
  /home/wauplin/projects/huggingface_hub/src/huggingface_hub/hf_file_system.py(234): _open
  /home/wauplin/projects/huggingface_hub/.venv310/lib/python3.10/site-packages/fsspec/spec.py(1307): open
  /home/wauplin/projects/huggingface_hub/duc.py(7): <module>

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thank you !

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 6, 2023

The documentation is not available anymore as the PR was closed or merged.

@Wauplin
Copy link
Contributor Author

Wauplin commented Nov 6, 2023

Thanks for the quick approval 😉 Should be released soon btw :)

@Wauplin Wauplin merged commit aa4d3fa into main Nov 6, 2023
10 of 15 checks passed
@Wauplin Wauplin deleted the 1800-better-error-message-in-hffs-if-repo-not-found branch November 6, 2023 13:53
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@@ -413,7 +413,21 @@ class HfFileSystemFile(fsspec.spec.AbstractBufferedFile):
def __init__(self, fs: HfFileSystem, path: str, revision: Optional[str] = None, **kwargs):
super().__init__(fs, path, **kwargs)
self.fs: HfFileSystem
self.resolved_path = fs.resolve_path(path, revision=revision)

mode = kwargs.get("mode", "r")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsspec's AbstractBufferedFile has "rb" as the default mode...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, thanks for letting me know. I'll open a follow-up PR to update that but in any case it doesn't change anything (this value is used only to check if it's "w" or "wb")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this commit (18d0ae2) to avoid confusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error message to write to the Hub using DuckDB when repo doesn't exist
4 participants