Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow s3 connection after 0.16.1 #2377

Closed
echai58 opened this issue Apr 3, 2024 · 14 comments · Fixed by #2385 or #2442
Closed

Very slow s3 connection after 0.16.1 #2377

echai58 opened this issue Apr 3, 2024 · 14 comments · Fixed by #2385 or #2442
Labels
bug Something isn't working

Comments

@echai58
Copy link

echai58 commented Apr 3, 2024

Environment

Delta-rs version: 0.16.1

Binding: python


Bug

What happened:
I am using s3 compatible storage, and after trying to upgrade to 0.16.1, I noticed very slow calls to DeltaTable, with the following warning:
[2024-04-03T13:37:12Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))

This warning gets logged 3 times before the call to DeltaTable finishes, and it takes ~3 seconds, whereas on delta-rs 0.16.0 it takes 0.1 seconds.

I assume it may be happening due to this PR: #2243, it's the most relevant change I could find.

These are the storage options I'm passing in:
storage_options={ "AWS_ALLOW_HTTP": "true", "AWS_ENDPOINT_URL": AWS_ENDPOINT_URL, "AWS_REGION": "custom", "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID, "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY }

@echai58 echai58 added the bug Something isn't working label Apr 3, 2024
@ion-elgreco
Copy link
Collaborator

@mightyshazam or @rtyler should be able to provide more details on this one

@mightyshazam
Copy link
Contributor

I'll take a look. I suspect it is because the AWS SDK attempts all authentication methods. Prior to that, we did not implement them all, so it would've never tried the metadata endpoint.

@mightyshazam
Copy link
Contributor

@echai58 can you run your code with the environment variable AWS_EC2_METADATA_DISABLED set to true.
@rtyler Assuming this is an IMDS issue, do you think we should treat this as a breaking change with the above remediation, or should we introduce a setting to turn it on and off for the s3 logstore?

@echai58
Copy link
Author

echai58 commented Apr 3, 2024

@mightyshazam Yes, running with that env var set seems to work. It would be nice if we could pass that in as part of storage_options, that would make this a lot easier to deal with.

@mightyshazam
Copy link
Contributor

@mightyshazam Yes, running with that env var set seems to work. It would be nice if we could pass that in as part of storage_options, that would make this a lot easier to deal with.

Thanks for confirming. I'll put together an update.

@echai58
Copy link
Author

echai58 commented Apr 4, 2024

@mightyshazam Yes, running with that env var set seems to work. It would be nice if we could pass that in as part of storage_options, that would make this a lot easier to deal with.

Thanks for confirming. I'll put together an update.

thank you!

@echai58
Copy link
Author

echai58 commented Apr 22, 2024

Hi @mightyshazam, i left a comment (#2385 (comment)) on the PR - seems like passing in the variable as a config is not working

@ion-elgreco ion-elgreco reopened this Apr 22, 2024
@mightyshazam
Copy link
Contributor

Hi @mightyshazam, i left a comment (#2385 (comment)) on the PR - seems like passing in the variable as a config is not working

I just tried this with the latest main. I modified an additional integration test test_roundtrip_s3_direct. I updated the config as such

    with pytest.raises(IOError):
        anon_storage_options = {
            "AWS_ENDPOINT_URL": s3_localstack_creds["AWS_ENDPOINT_URL"],
            # Grants anonymous access. If we don't do this, will timeout trying
            # to reading from EC2 instance provider.
            "AWS_ACCESS_KEY_ID": "",
            "AWS_SECRET_ACCESS_KEY": "",
+            "AWS_EC2_METADATA_DISABLED": "true",
        }
        write_deltalake(
            table_path,
            sample_data,
            storage_options=anon_storage_options,
        )

Then I ran the integration tests with python -m pytest -m '(s3 and integration)' --durations=0
Running the test with the setting is an order of magnitude faster than without. Are you experiencing something different or setting it another way? If you're setting it as an environment variable, it should also work with the Rust AWS SDK

@echai58
Copy link
Author

echai58 commented Apr 22, 2024

@mightyshazam I'm doing the following:

DeltaTable(
    f"s3a://{AWS_BUCKET_NAME}/{path}",
    storage_options={
        "AWS_ALLOW_HTTP": "true", 
        "AWS_ENDPOINT_URL": AWS_ENDPOINT_URL,
        "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID,
        "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY,
        "AWS_EC2_METADATA_DISABLED": "true"
    },
)

It seems like it's only the first time it connects it runs, it takes ~3 seconds, even with the AWS_EC2_METADATA_DISABLED flag passed in via storage_options.

When I run it a subsequent time, I don't see the warning anymore and it takes ~0.05 seconds, which is expected (and the runtime of the first run when running with the env var set)

@mightyshazam
Copy link
Contributor

@echai58 I found a few more places in the AWS SDK where we need to override the behavior. I prefer that AWS allow us to pass through this setting, but it only works as an environment variable unless we setup everything manually.

@mightyshazam
Copy link
Contributor

@echai58 #2442 changes the default behavior for DynamoDb locking using the AWS SDK. The default value for AWS_EC2_METADATA_DISABLED is true, so you shouldn't have to set it anymore.

ion-elgreco pushed a commit that referenced this issue Apr 23, 2024
)

# Description
The AWS SDK uses EC2 instance metadata in the default provider chain,
the profile chain and the region provider

# Related Issue(s)
<!---
For example:

- closes #106
--->
- closes #2377 
# Documentation

<!---
Share links to useful documentation
--->
@echai58
Copy link
Author

echai58 commented Apr 23, 2024

@mightyshazam thank you! will test it out later, appreciate the help

@herlma
Copy link

herlma commented May 29, 2024

Hi @mightyshazam, are there any docs on how to enable this for use with IMDS? I've tried passing the 'AWS_EC2_METADATA_DISABLED':'false' argument into the python wrapper and continue to get a Generic S3 error: Error after 10 retries in 4.740948202s, max_retries:10, retry_timeout:180s, source:error sending request for url type error despite being able to access the target s3 url using aws cli

@mightyshazam
Copy link
Contributor

@herlma Do you have any other settings that you are passing? Given that you are getting that error, I suspect it is using IMDS since it should fail much faster. Do you have any other log messages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants