-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probably avoidable fatal S3 race condition #693
Comments
Makes sense. Are you able to make a PR? |
@mpenkov We're currently not using smart_open, so I wouldn't be able to justify spending the time on it (this is a work situation for me). That may change in the future, but right now, unfortunately that's what it is for me. |
@mpenkov Is this issue still open? If so I would like to take a crack at submitting a PR for this. |
Sure, go for it. |
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 5, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 5, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 5, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 5, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 5, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 5, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 6, 2022
RachitSharma2001
added a commit
to RachitSharma2001/smart_open
that referenced
this issue
Nov 11, 2022
mpenkov
added a commit
that referenced
this issue
Nov 29, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem description
When using
s3.iter_bucket
from my Lambda, I observed the following problem.A user on my system was deleting an object (valid use case). While that user was deleting it, unrelated to it,
smart_open.s3.iter_bucket
kicked off. Inside that function, the key iterator is created, and then a download is started for each key. The key (of the object about to be deleted) showed up in the result. Once it was time to download the object, it was no longer there, and a 404 was thrown, which proved fatal and an exception was thrown.While this may seem like a fluke,
a) it was on a system with very little user activity.
b) S3's list operation is (iirc) not strongly consistent and may for a certain duration return keys that have already been deleted.
So it may be more common than it seems. I think that it would be reasonable for
iter_bucket
to skip objects that return a 404 during their download (ie: catch the error and suppress it), and carry on iterating. I thinkiter_bucket
would still fulfill its duty this way.Steps/code to reproduce the problem
The code:
The error:
Versions
This is running on:
Checklist
Before you create the issue, please make sure you have:
The text was updated successfully, but these errors were encountered: