-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Azure: Support multiple storage credential prefixes #13241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Azure: Support multiple storage credential prefixes #13241
Conversation
|
Yet to implement the test cases. |
azure/src/main/java/org/apache/iceberg/azure/adlsv2/PrefixedADLSFileClientBuilder.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/PrefixedADLSFileClientBuilder.java
Outdated
Show resolved
Hide resolved
|
@nastra @amogh-jahagirdar Can you please help with the review. |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the change @ChaladiMohanVamsi , I'm Still going through tests a bit more but left some comments
| return client(location).getFileClient(location.path()); | ||
| Preconditions.checkState( | ||
| null != prefixedADLSClient, | ||
| "[BUG] ADLS client-builder for storage path not available: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "[BUG]" callout doesn't really fit in the pattern of error messages in the project, can we just say "Cannot build ADLS cllient for path: %s"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw we're using the same logging pattern (which I added) in S3FileIO and GcsFileIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it consistent with other FileIO implementations as of now.
| public void multipleStorageCredentialsConfigured() { | ||
| StorageCredential adlsCredential1 = | ||
| StorageCredential.create( | ||
| abfsPath("custom-container", "account1", "/table1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move some of these abfsPath values into separate variables; it'll reduce duplication and make it easier to verify these tests are doing what they should be
| testImplementation libs.testcontainers | ||
| testImplementation libs.mockserver.netty | ||
| testImplementation libs.mockserver.client.java | ||
| testImplementation(libs.hadoop3.common) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this need to change? what are we pulling from hadoop for the new tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In few test cases where we initialise ResolvingFileIO, it is failing with missing HaddopConf dependency. Hence had include this in test scope.
| Optional.ofNullable(vendedAdlsCredentialProvider) | ||
| .map(p -> new VendedAzureSasCredentialPolicy(location.host(), p)) | ||
| .ifPresent(clientBuilder::addPolicy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I know this was just copied over from what already existed but imo this logic would just be a bit clearer with plain old null check and assignment:
if (credentialProvider != null) {
clientBuilder.addPolicy(new VendedAzureSasCredentialPolicy(location.host(), credentialProvider)))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as per the suggestion.
| assertThat(fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/table1"))) | ||
| .isNotSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/table2"))) | ||
| .isNotSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file"))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this test isn't as strong as it could be; we're asserting that the clients for these paths are all different which is true but we're not asserting the actual client right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike S3FileIO and GCSFileIO we are not caching the clients in ADLSFileIO, we are always creating a new client with reused HTTP client during any FileIO operation. I kept the behaviour same even with prefix based clients. Hence we are not validating client as they would be different for each file.
I am not completely sure of overall benefit we can get out of caching the clients as we are already reusing HTTP client. Wanted your inputs if we should go in this route of caching the clients similar to other FileIO implementations.
cc// @nastra
| assertThat( | ||
| fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/path/to/file"))) | ||
| .isNotSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file"))); | ||
|
|
||
| assertThat(fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file"))) | ||
| .isSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("my-container2", "account2", "/path/to/file"))); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as what I mentioned somewhere else in this test, could we make sure some of these abfsPath(...) are moved to separate values and can we make sure these tests are strong as possible beyond just comparing if it's the same as another path.
azure/src/integration/java/org/apache/iceberg/azure/adlsv2/PrefixedADLSClientTest.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSFileIO.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSLocation.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSOutputFile.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/PrefixedADLSClient.java
Outdated
Show resolved
Hide resolved
azure/src/integration/java/org/apache/iceberg/azure/adlsv2/PrefixedADLSClientTest.java
Outdated
Show resolved
Hide resolved
| clientByPrefix.values().forEach(PrefixedADLSClient::close); | ||
| this.clientByPrefix = null; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should keep the newline after }
# Conflicts: # azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSFileIO.java
| return azureProperties; | ||
| } | ||
|
|
||
| public String storagePrefix() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can keep this package private until there's a good reason to make it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, made it package private.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
Associated Issue : #13050
Changes
SupportsStorageCredentialsto pass storage credentials from LoadTableResponse to FileIOSimilar PRs for other FileIOs