Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade code to use aws sdk go v2 #6486

Merged
merged 59 commits into from
Sep 12, 2023
Merged

Upgrade code to use aws sdk go v2 #6486

merged 59 commits into from
Sep 12, 2023

Conversation

nopcoder
Copy link
Contributor

@nopcoder nopcoder commented Aug 25, 2023

  • Upgrade AWS Go SDK to v2: s3 adapter implementation, Dynamodb client and additional STS and direct client code
  • Configuration parameters streaming_chunk_timeout and streaming_chunk_size removed from S3.
  • Configuration parameter access_secret_key was removed from S3 credentials.
  • Configuration parameters client_log_request and client_log_retries added to s3 block adapter to control logging level of the s3 adapter.

TODO:

  • Pass: Test metastore client commands using trino and dbt
  • Pass: Test lakeFS with Spark 2.X + Spark 3.x
  • Pre-signed URL expiry window
  • Server side encrypted bucket
  • Upload using client SDKs

Close #2684
Close #6553

@nopcoder nopcoder added the exclude-changelog PR description should not be included in next release changelog label Aug 25, 2023
@nopcoder nopcoder self-assigned this Aug 25, 2023
@nopcoder nopcoder force-pushed the chore/aws-sdk-go-v2 branch from a737512 to bad1f51 Compare August 26, 2023 22:02
@github-actions
Copy link

github-actions bot commented Aug 26, 2023

♻️ PR Preview 204c1f4 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

pkg/block/s3/adapter.go Fixed Show fixed Hide fixed
@nopcoder nopcoder force-pushed the chore/aws-sdk-go-v2 branch from e971157 to eea0f60 Compare August 28, 2023 16:02
@nopcoder nopcoder added area/block-adapter include-changelog PR description should be included in next release changelog AWS and removed exclude-changelog PR description should not be included in next release changelog labels Aug 28, 2023
@nopcoder nopcoder force-pushed the chore/aws-sdk-go-v2 branch from a82f8c8 to 9de5ecb Compare August 30, 2023 07:04
@nopcoder nopcoder requested a review from johnnyaug September 1, 2023 10:34
@nopcoder nopcoder marked this pull request as ready for review September 1, 2023 10:34
cmd/lakectl/cmd/metastore.go Outdated Show resolved Hide resolved
pkg/api/controller.go Outdated Show resolved Hide resolved
pkg/block/s3/adapter.go Outdated Show resolved Hide resolved
pkg/config/config_test.go Outdated Show resolved Hide resolved
@nopcoder nopcoder requested a review from johnnyaug September 5, 2023 15:19
@nopcoder
Copy link
Contributor Author

nopcoder commented Sep 5, 2023

@guy-har can you review only the adapter part that before had to stream data to s3 - the s3 client I'm using should provide the same functionality.

pkg/block/s3/adapter_test.go Outdated Show resolved Hide resolved
pkg/block/s3/inventory.go Outdated Show resolved Hide resolved
pkg/cloud/aws/metadata.go Outdated Show resolved Hide resolved
pkg/block/s3/adapter.go Outdated Show resolved Hide resolved
pkg/block/s3/adapter.go Outdated Show resolved Hide resolved
pkg/block/s3/adapter.go Outdated Show resolved Hide resolved
Copy link
Contributor

@johnnyaug johnnyaug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Amazing work.

@nopcoder nopcoder requested a review from johnnyaug September 6, 2023 14:28
@nopcoder nopcoder force-pushed the chore/aws-sdk-go-v2 branch from 27c8a32 to a51cd7e Compare September 7, 2023 10:55
@nopcoder nopcoder force-pushed the chore/aws-sdk-go-v2 branch 2 times, most recently from ddbf018 to 62cea17 Compare September 10, 2023 08:58
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nits. It's a neat way of getting at the expiry time.

// TODO(barak): handle expiry window of the client credentials when pre-signed
// support enabled
expiry := time.Now().Add(a.preSignedExpiry)
expiry := time.Now().UTC().Add(a.preSignedExpiry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTC is a timezone. The best kind of timezone, but still a timezone. Unless Golang defaults yo something line SQL TIMESTAMP (which is without TIMEZONE), I don't think UTC is needed or required here.
Timezones belong when formatting time, not when processing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, I will remove it.
it was easy to debug as the aws credentials expire time was in this tz.


func (c *CaptureExpiresPresigner) PresignHTTP(ctx context.Context, credentials aws.Credentials, r *http.Request, payloadHash string, service string, region string, signingTime time.Time, optFns ...func(*v4.SignerOptions)) (url string, signedHeader http.Header, err error) {
// capture credentials expiry
c.CredentialsCanExpire = credentials.CanExpire
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could just capture credentials and make them available later on. No need to run the processing logic here. I prefer to keep the signing path as simple and unmodified as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried about capturing something that holds a state that can change and instead capture the same information the signer accepted.
Let me know if you think. if the risk is low I'll update the code.

@arielshaqed
Copy link
Contributor

Also... I understand that the v2 SDK will refresh credentials, and do so on time? The v1 SDK doesn't refresh credentials if it only presigns, and we needed refreshClientIfNeeded.

@nopcoder
Copy link
Contributor Author

Also... I understand that the v2 SDK will refresh credentials, and do so on time? The v1 SDK doesn't refresh credentials if it only presigns, and we needed refreshClientIfNeeded.

Yes, the sdk v2 presign_middleware.go calls s.credentialsProvider.Retrieve(ctx) to refresh the current credentials. It is depends on the credentials provider and the window we configure to make sure we get enough time for our signed URL.

if captureExpiresPresigner.CredentialsCanExpire && captureExpiresPresigner.CredentialsExpireAt.Before(expiry) {
expiry = captureExpiresPresigner.CredentialsExpireAt
expiry = captureExpiresPresigner.CredentialsExpireAt.Add(a.sessionExpiryWindow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLEASE add a comment before this. I did not understand the logic until you explained it to me!
Perhaps something like

// AWS Go SDK v2 stores the time to renew credentials in `CredentialsExpireAt`.  This is
// a.sessionExpiryWindow before actual credentials expiry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arielshaqed !

@nopcoder nopcoder merged commit c8d8859 into master Sep 12, 2023
33 checks passed
@nopcoder nopcoder deleted the chore/aws-sdk-go-v2 branch September 12, 2023 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/block-adapter AWS include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API cleanup: s3 inventory remove deprecated functionality Upgrade to AWS Go SDK v2
3 participants