-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow s3 connection after 0.16.1 #2377
Comments
@mightyshazam or @rtyler should be able to provide more details on this one |
I'll take a look. I suspect it is because the AWS SDK attempts all authentication methods. Prior to that, we did not implement them all, so it would've never tried the metadata endpoint. |
@mightyshazam Yes, running with that env var set seems to work. It would be nice if we could pass that in as part of |
Thanks for confirming. I'll put together an update. |
thank you! |
Hi @mightyshazam, i left a comment (#2385 (comment)) on the PR - seems like passing in the variable as a config is not working |
I just tried this with the latest main. I modified an additional integration test with pytest.raises(IOError):
anon_storage_options = {
"AWS_ENDPOINT_URL": s3_localstack_creds["AWS_ENDPOINT_URL"],
# Grants anonymous access. If we don't do this, will timeout trying
# to reading from EC2 instance provider.
"AWS_ACCESS_KEY_ID": "",
"AWS_SECRET_ACCESS_KEY": "",
+ "AWS_EC2_METADATA_DISABLED": "true",
}
write_deltalake(
table_path,
sample_data,
storage_options=anon_storage_options,
) Then I ran the integration tests with |
@mightyshazam I'm doing the following:
It seems like it's only the first time it connects it runs, it takes ~3 seconds, even with the When I run it a subsequent time, I don't see the warning anymore and it takes ~0.05 seconds, which is expected (and the runtime of the first run when running with the env var set) |
@echai58 I found a few more places in the AWS SDK where we need to override the behavior. I prefer that AWS allow us to pass through this setting, but it only works as an environment variable unless we setup everything manually. |
@mightyshazam thank you! will test it out later, appreciate the help |
Hi @mightyshazam, are there any docs on how to enable this for use with IMDS? I've tried passing the 'AWS_EC2_METADATA_DISABLED':'false' argument into the python wrapper and continue to get a Generic S3 error: Error after 10 retries in 4.740948202s, max_retries:10, retry_timeout:180s, source:error sending request for url type error despite being able to access the target s3 url using aws cli |
@herlma Do you have any other settings that you are passing? Given that you are getting that error, I suspect it is using IMDS since it should fail much faster. Do you have any other log messages? |
Environment
Delta-rs version: 0.16.1
Binding: python
Bug
What happened:
I am using s3 compatible storage, and after trying to upgrade to 0.16.1, I noticed very slow calls to
DeltaTable
, with the following warning:[2024-04-03T13:37:12Z WARN aws_config::imds::region] failed to load region from IMDS err=failed to load IMDS session token: dispatch failure: timeout: error trying to connect: HTTP connect timeout occurred after 1s: HTTP connect timeout occurred after 1s: timed out (FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: hyper::Error(Connect, HttpTimeoutError { kind: "HTTP connect", duration: 1s }), connection: Unknown } }) }))
This warning gets logged 3 times before the call to
DeltaTable
finishes, and it takes ~3 seconds, whereas on delta-rs 0.16.0 it takes 0.1 seconds.I assume it may be happening due to this PR: #2243, it's the most relevant change I could find.
These are the storage options I'm passing in:
storage_options={ "AWS_ALLOW_HTTP": "true", "AWS_ENDPOINT_URL": AWS_ENDPOINT_URL, "AWS_REGION": "custom", "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY_ID, "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY }
The text was updated successfully, but these errors were encountered: