Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ADLS Gen2 #5027

Merged
merged 28 commits into from
Jan 15, 2023
Merged

Support ADLS Gen2 #5027

merged 28 commits into from
Jan 15, 2023

Conversation

N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented Jan 12, 2023

Closes #5037

@N-o-Z N-o-Z added the team/versioning-engine Team versioning engine label Jan 12, 2023
@N-o-Z N-o-Z marked this pull request as ready for review January 12, 2023 04:16
@N-o-Z N-o-Z added the include-changelog PR description should be included in next release changelog label Jan 12, 2023
@N-o-Z N-o-Z requested review from ozkatz, guy-har and a team January 14, 2023 20:27
@N-o-Z
Copy link
Member Author

N-o-Z commented Jan 14, 2023

PR ready to review.
Added an Esti Azure with HNS flavor job to the pipeline which runs our integration tests with ADLS Gen2.

@ozkatz adding you to reviewers as I had to make some changes to the pre signed URL code

@N-o-Z N-o-Z changed the title Azure playground - test new SDK Support ADLS Gen2 Jan 14, 2023
Copy link
Contributor

@guy-har guy-har left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great addition, the new SDK seems much better than the previous one. Moving to the new SDK should even solve more issues other than support ADLS Gen2. Thanks!
Added some comments

Comment on lines -168 to -171
case azure.AuthMethodMSI:
credentials, err = azure.GetMSICredentials()
default:
err = ErrAuthMethodNotSupported
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we aren't supporting MSI anymore, not sure it's used, but this will be a breaking change. Can we still support it?
Another thing, that shouldn't be part of this PR. IIRC the new SDK has a way to connect and take the configurations from the running environment which was once requested, maybe we should open an issue for that and fix it as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right - I originally added the code which checks for default credentials (CLI, ENV etc..)
Re-added it - let me know if it solves the issue

pkg/block/factory/build.go Outdated Show resolved Hide resolved
Comment on lines +46 to +53
ctx: ctx,
cancel: cancel,
reader: from,
to: to,
id: newID(),
o: o,
errCh: make(chan error, 1),
buffers: buffers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General message on this file, previously it was copy of the chunkwriting file from the SDK with small modifications, that way if there is a significant change in a new version applying it would be straightforward. I noticed that the implementation in the new Azure SDK is different. As it looks now we have a copy of the old implementation with our modifications, is there a way to adjust somehow the implementation from the new SDK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new implementation is using a tracker to track the parts which we are already doing in a different way to support multiple requests. I did use the new mmbPool struct they are using and modified the old code to work similarly to how they do it. I found it difficult to align to their implementation but if you have any suggestions - I would love to hear

Comment on lines -31 to -37
func GetAccessKeyCredentials(accountName, accountKey string) (azblob.Credential, error) {
if len(accountName) == 0 && len(accountKey) == 0 {
// fallback to Azure environment variables
accountName, accountKey = os.Getenv("AZURE_STORAGE_ACCOUNT"), os.Getenv("AZURE_STORAGE_ACCESS_KEY")
}
return azblob.NewSharedKeyCredential(accountName, accountKey)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we are losing the support of taking the AZURE configurations... this might be a breaking change, can we keep this support. (Also here I think that the new way of generating an azure client might solve this as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +73 to +75
func (m *MultipartBlockWriter) Upload(_ context.Context, _ io.ReadSeekCloser, _ *blockblob.UploadOptions) (blockblob.UploadResponse, error) {
panic("Should not be called")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want to panic here? would logging and returning and error be enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not support it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed to implement the interface - but we don't really need to use it.
I'm not sure what is the right implementation as I do not know how this will be used.

pkg/ingest/store/factory.go Outdated Show resolved Hide resolved
pkg/ingest/store/azure.go Outdated Show resolved Hide resolved
Comment on lines -196 to -210

// TODO(Guys): remove this work around once azure fixes panic issue and use azblob.UploadStreamToBlockBlob
transferManager, err := azblob.NewStaticBuffer(_1MiB, MaxBuffers)
if err != nil {
return err
}
uploadOpts := a.translatePutOpts(ctx, opts)
uploadOpts.TransferManager = transferManager
defer transferManager.Close()
resp, err := copyFromReader(ctx, reader, blobURL, uploadOpts)
if err != nil {
return err
}
_ = resp == nil // this is done in order to ignore "result 0 is never used" error ( copyFromReader is copied from azure, and we want to keep it with minimum changes)
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 😄

qp, err := client.GetAccountSASURL(aztables.AccountSASResourceTypes{
Container: true,
Object: true,
}, permissions, time.Now(), a.preSignedURLDurationGenerator())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should use time.Now() it seems like we pass a time that "IsZero" it isn't specified, which is what I believe we want in this case.

pkg/block/azure/adapter.go Outdated Show resolved Hide resolved
.github/workflows/esti.yaml Outdated Show resolved Hide resolved
pkg/block/azure/adapter.go Outdated Show resolved Hide resolved
pkg/block/azure/adapter.go Outdated Show resolved Hide resolved
pkg/block/azure/adapter.go Outdated Show resolved Hide resolved
pkg/block/azure/adapter.go Show resolved Hide resolved
pkg/block/factory/build.go Show resolved Hide resolved
pkg/block/factory/build.go Outdated Show resolved Hide resolved
pkg/block/factory/build.go Outdated Show resolved Hide resolved
pkg/ingest/store/factory.go Outdated Show resolved Hide resolved
Comment on lines 143 to 151
if err != nil {
return nil, err
}
} else {
var err error
p, err = getAzureClient()
c, err = getAzureClient()
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can extract out the if err != nil block

pkg/block/azure/adapter.go Outdated Show resolved Hide resolved
pkg/block/factory/build.go Outdated Show resolved Hide resolved
func BuildAzureServiceClient(params params.Azure) (*service.Client, error) {
url := fmt.Sprintf(azure.AzURLTemplate, params.StorageAccount)
options := service.ClientOptions{ClientOptions: azcore.ClientOptions{Retry: policy.RetryOptions{TryTimeout: params.TryTimeout}}}
if params.StorageAccessKey != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change the documentation to reflect this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #5044

@N-o-Z N-o-Z enabled auto-merge (squash) January 15, 2023 15:50
Copy link
Contributor

@nopcoder nopcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! nice upgrade from the previous SDK.


preSignedBlobPattern = "https://%s.blob.core.windows.net/%s/%s?%s"
AzURLTemplate = "https://%s.blob.core.windows.net/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already in azure package - so maybe just URLTemplate? or something without the Az prefix?


// TODO (niro): copy is limited to 256MB, should we handle it somehow?
_, err = destClient.CopyFromURL(ctx, sasKey, nil)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we should document this limitation as we do not support a working copy for large objects.

@N-o-Z N-o-Z merged commit 7f53849 into master Jan 15, 2023
@N-o-Z N-o-Z deleted the azure-playground branch January 15, 2023 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog team/versioning-engine Team versioning engine
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Azure Data Lake Gen2
3 participants