-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not fail on touch()
if OSError
(to cache non existence of file)
#2505
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok makes sense! Hard to add a test for this I think
I think @ydshieh is currently testing it on a live S3 mounted drive in a CI to make sure we are solving their problem. Unit testing is not worth it IMO |
Yes, let me check and give update here later. |
try: | ||
no_exist_file_path.touch() | ||
except OSError as e: | ||
logger.error( | ||
f"Could not cache non-existence of file. Will ignore error and continue. Error: {e}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current issue we have is that we can't touch (with S3) if this file exists.
This change here solves that issue (I confirmed with CI), but it's more general (i.e. whenever there is an OSSError no matter what cuases it).
Should we check if the file exists anyway just after this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think so. As I mentioned in the PR description, this file exists only to save a HEAD call so it's a very optional optimization. If it fails because of a filesystem-specific issue then it's not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Checked with CI and the previous failing test like
(on new cluster/cache system) now passes. |
Thanks for confirming! I'll merge it then :) |
Related to this thread on slack (private link).
When trying to download a file, if the file does not exist on the Hub, we cache this information by creating an empty file under the
./no-exists
folder. This is used intransformers
to avoid requesting a given file each time the pipeline is reloaded. The only purpose of this file is to save a HEAD call on initialization. @ydshieh and @glegendre01 have found out thatPath.touch()
raises aPermissionError
when used on a S3-mounted drive if the file already exists. This PR adds a try/catch block to ignore such failures.EDIT: I've checked and it's the only occurrence where we use
.touch()
on a potentially existing file.