Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_cdf() issue : Generic S3 error: request or response body error: operation timed out #2549

Closed
jiaw314 opened this issue May 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jiaw314
Copy link

jiaw314 commented May 29, 2024

Environment

Delta-rs version: 0.17.4

Binding: Python 3.11.6

Environment: MacBook Pro M1

  • Cloud provider: AWS S3
  • OS: macOS Sonoma Version 14.4.1
  • Memory: 64 GB
  • Other:
    aiobotocore==2.13.0
    aiohttp==3.9.5
    aioitertools==0.11.0
    aiosignal==1.3.1
    asn1crypto==1.5.1
    attrs==23.2.0
    boto3==1.34.104
    botocore==1.34.104
    certifi==2024.2.2
    cffi==1.16.0
    charset-normalizer==3.3.2
    cryptography==42.0.7
    deltalake==0.17.4
    filelock==3.14.0
    frozenlist==1.4.1
    fsspec==2024.5.0
    idna==3.7
    jmespath==1.0.1
    multidict==6.0.5
    numpy==1.26.4
    packaging==24.0
    pandas==2.2.2
    platformdirs==4.2.2
    polars==0.20.27
    pyarrow==16.0.0
    pyarrow-hotfix==0.6
    pycparser==2.22
    PyJWT==2.8.0
    pyOpenSSL==24.1.0
    pyspark==2.4.3
    python-dateutil==2.9.0.post0
    pytz==2024.1
    requests==2.31.0
    s3fs==2024.5.0
    s3transfer==0.10.1
    six==1.16.0
    snowflake-connector-python==3.10.0
    sortedcontainers==2.4.0
    tomlkit==0.12.5
    typing_extensions==4.11.0
    tzdata==2024.1
    urllib3==2.2.1
    wrapt==1.16.0
    yarl==1.9.4

Bug

What happened:
The load_cdf() method works for nearly all of our delta tables on AWS S3 but it seems to be running into an error on a few:

thread '' panicked at python/src/lib.rs:611:18:
called Result::unwrap() on an Err value: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic S3 error: request or response body error: operation timed out")), None)
stack backtrace:
0: 0x3028a50e4 - _BrotliDecoderVersion
1: 0x3028c8e50 - _BrotliDecoderVersion
2: 0x3028a1ee0 - _BrotliDecoderVersion
3: 0x3028a4f18 - _BrotliDecoderVersion
4: 0x3028a66bc - _BrotliDecoderVersion
5: 0x3028a6404 - _BrotliDecoderVersion
6: 0x3028a6af8 - _BrotliDecoderVersion
7: 0x3028a69ec - _BrotliDecoderVersion
8: 0x3028a5568 - _BrotliDecoderVersion
9: 0x3028a6774 - _BrotliDecoderVersion
10: 0x30299fb60 - _BrotliDecoderVersion
11: 0x30299ff14 - _BrotliDecoderVersion
12: 0x3001f9998 - _PyInit__internal
13: 0x30012bc1c -
14: 0x3001341f4 -
15: 0x300113ce0 -
16: 0x30012e7d4 -
17: 0x101237f1c - _method_vectorcall_VARARGS_KEYWORDS
18: 0x101303d5c - __PyEval_EvalFrameDefault
19: 0x1012f9444 - _PyEval_EvalCode
20: 0x10134ea18 - _run_eval_code_obj
21: 0x10134e97c - _run_mod
22: 0x10134e7bc - _pyrun_file
23: 0x10134e20c - __PyRun_SimpleFileObject
24: 0x10134db9c - __PyRun_AnyFileObject
25: 0x101369f70 - _pymain_run_file_obj
26: 0x1013698b0 - _pymain_run_file
27: 0x101369190 - _Py_RunMain
28: 0x10136a2c8 - _Py_BytesMain
Traceback (most recent call last):
File "/Users/jiawang/Desktop/Environments/deltars_test/backfill&continuous_batch_pandas_catalog_v2.py", line 127, in
dt.load_cdf(starting_version=delta_max_version).read_all()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jiawang/Desktop/Environments/deltars_test/lib/python3.11/site-packages/deltalake/table.py", line 694, in load_cdf
return self._table.load_cdf(
^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called Result::unwrap() on an Err value: ArrowError(ExternalError(General("ParquetObjectReader::get_byte_ranges error: Generic S3 error: request or response body error: operation timed out")), None)

What you expected to happen:
I expect to get the change data feed for the latest version of the delta table when I call load_cdf().

How to reproduce it:
Call load_cdf() on a very large Delta table?

More details:

@jiaw314 jiaw314 added the bug Something isn't working label May 29, 2024
@jiaw314 jiaw314 changed the title load_cdf() issue load_cdf() issue : Generic S3 error: request or response body error: operation timed out May 29, 2024
@ion-elgreco
Copy link
Collaborator

You can increase the timeout, #2537 (comment)

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants