Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Probably avoidable fatal S3 race condition #693

Closed
3 tasks done
ronkorving opened this issue Apr 19, 2022 · 4 comments · Fixed by #735
Closed
3 tasks done

Probably avoidable fatal S3 race condition #693

ronkorving opened this issue Apr 19, 2022 · 4 comments · Fixed by #735

Comments

@ronkorving
Copy link
Contributor

Problem description

When using s3.iter_bucket from my Lambda, I observed the following problem.

A user on my system was deleting an object (valid use case). While that user was deleting it, unrelated to it, smart_open.s3.iter_bucket kicked off. Inside that function, the key iterator is created, and then a download is started for each key. The key (of the object about to be deleted) showed up in the result. Once it was time to download the object, it was no longer there, and a 404 was thrown, which proved fatal and an exception was thrown.

While this may seem like a fluke,

a) it was on a system with very little user activity.
b) S3's list operation is (iirc) not strongly consistent and may for a certain duration return keys that have already been deleted.

So it may be more common than it seems. I think that it would be reasonable for iter_bucket to skip objects that return a 404 during their download (ie: catch the error and suppress it), and carry on iterating. I think iter_bucket would still fulfill its duty this way.

Steps/code to reproduce the problem

The code:

def get_all_metadata() -> list[dict]:
    bucket = get_s3_bucket()
    prefix = f"{get_s3_path()}/"
    accept_key = lambda key: key.endswith(get_metadata_ext())

    # Not explicitly setting this property to False will cause an error in the Lambda runtime when
    # running iter_bucket. This is a known issue, but it doesn't look like it will be resolved:
    # https://github.com/RaRe-Technologies/smart_open/issues/605

    smart_open.concurrency._MULTIPROCESSING = False

    return [
        json.loads(content)
        for key, content in smart_open.s3.iter_bucket(
            bucket, prefix=prefix, accept_key=accept_key
        )
    ]

The error:

[ERROR] ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
Traceback (most recent call last):
  File "/var/task/lambdas/filemanager/list.py", line 20, in handler
    metadata_list = filemanager.get_all_metadata()
  File "/var/task/lambdas/lib/filemanager.py", line 78, in get_all_metadata
    return [
  File "/var/task/lambdas/lib/filemanager.py", line 78, in <listcomp>
    return [
  File "/var/task/smart_open/s3.py", line 1192, in iter_bucket
    for key_no, (key, content) in enumerate(result_iterator):
  File "/var/task/smart_open/concurrency.py", line 58, in imap_unordered
    yield future.result()
  File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/var/lang/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/var/lang/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/var/task/smart_open/s3.py", line 1254, in _download_key
    content_bytes = _download_fileobj(bucket, key_name)
  File "/var/task/smart_open/s3.py", line 1271, in _download_fileobj
    bucket.download_fileobj(key_name, buf)
  File "/var/runtime/boto3/s3/inject.py", line 719, in bucket_download_fileobj
    return self.meta.client.download_fileobj(
  File "/var/runtime/boto3/s3/inject.py", line 679, in download_fileobj
    return future.result()
  File "/var/runtime/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File "/var/runtime/s3transfer/futures.py", line 266, in result
    raise self._exception
  File "/var/runtime/s3transfer/tasks.py", line 269, in _main
    self._submit(transfer_future=transfer_future, **kwargs)
  File "/var/runtime/s3transfer/download.py", line 354, in _submit
    response = client.head_object(
  File "/var/runtime/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 719, in _make_api_call
    raise error_class(parsed_response, operation_name)

Versions

This is running on:

  • AWS Lambda
  • Intel
  • Python 3.9
  • smart_open 5.2.1

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software
@mpenkov
Copy link
Collaborator

mpenkov commented Aug 21, 2022

Makes sense. Are you able to make a PR?

@ronkorving
Copy link
Contributor Author

@mpenkov We're currently not using smart_open, so I wouldn't be able to justify spending the time on it (this is a work situation for me). That may change in the future, but right now, unfortunately that's what it is for me.

@RachitSharma2001
Copy link
Contributor

@mpenkov Is this issue still open? If so I would like to take a crack at submitting a PR for this.

@mpenkov
Copy link
Collaborator

mpenkov commented Nov 3, 2022

Sure, go for it.

RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 5, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 5, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 5, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 5, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 5, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 5, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 6, 2022
RachitSharma2001 added a commit to RachitSharma2001/smart_open that referenced this issue Nov 11, 2022
mpenkov added a commit that referenced this issue Nov 29, 2022
* Fix avoidable S3 race condition (#693)

* add clarifying comment

Co-authored-by: Michael Penkov <m@penkov.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants