Skip to content

Conversation

@ChaladiMohanVamsi
Copy link
Contributor

Associated Issue : #13050

Changes

  • Add support for multiple storage credential prefixes in ADLSFileIO
  • ADLSFIleIO implements SupportsStorageCredentials to pass storage credentials from LoadTableResponse to FileIO

Similar PRs for other FileIOs

@github-actions github-actions bot added the AZURE label Jun 4, 2025
@ChaladiMohanVamsi
Copy link
Contributor Author

Yet to implement the test cases.

@github-actions github-actions bot added the build label Jun 7, 2025
@ChaladiMohanVamsi
Copy link
Contributor Author

@nastra @amogh-jahagirdar Can you please help with the review.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change @ChaladiMohanVamsi , I'm Still going through tests a bit more but left some comments

return client(location).getFileClient(location.path());
Preconditions.checkState(
null != prefixedADLSClient,
"[BUG] ADLS client-builder for storage path not available: %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "[BUG]" callout doesn't really fit in the pattern of error messages in the project, can we just say "Cannot build ADLS cllient for path: %s"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw we're using the same logging pattern (which I added) in S3FileIO and GcsFileIO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it consistent with other FileIO implementations as of now.

public void multipleStorageCredentialsConfigured() {
StorageCredential adlsCredential1 =
StorageCredential.create(
abfsPath("custom-container", "account1", "/table1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move some of these abfsPath values into separate variables; it'll reduce duplication and make it easier to verify these tests are doing what they should be

testImplementation libs.testcontainers
testImplementation libs.mockserver.netty
testImplementation libs.mockserver.client.java
testImplementation(libs.hadoop3.common) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this need to change? what are we pulling from hadoop for the new tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In few test cases where we initialise ResolvingFileIO, it is failing with missing HaddopConf dependency. Hence had include this in test scope.

Comment on lines 61 to 63
Optional.ofNullable(vendedAdlsCredentialProvider)
.map(p -> new VendedAzureSasCredentialPolicy(location.host(), p))
.ifPresent(clientBuilder::addPolicy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I know this was just copied over from what already existed but imo this logic would just be a bit clearer with plain old null check and assignment:

if (credentialProvider != null) {
      clientBuilder.addPolicy(new VendedAzureSasCredentialPolicy(location.host(), credentialProvider)))
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as per the suggestion.

Comment on lines 293 to 297
assertThat(fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/table1")))
.isNotSameAs(
fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/table2")))
.isNotSameAs(
fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this test isn't as strong as it could be; we're asserting that the clients for these paths are all different which is true but we're not asserting the actual client right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike S3FileIO and GCSFileIO we are not caching the clients in ADLSFileIO, we are always creating a new client with reused HTTP client during any FileIO operation. I kept the behaviour same even with prefix based clients. Hence we are not validating client as they would be different for each file.

I am not completely sure of overall benefit we can get out of caching the clients as we are already reusing HTTP client. Wanted your inputs if we should go in this route of caching the clients similar to other FileIO implementations.

cc// @nastra

Comment on lines 244 to 252
assertThat(
fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/path/to/file")))
.isNotSameAs(
fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file")));

assertThat(fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file")))
.isSameAs(
fileIO.clientForStoragePath(abfsPath("my-container2", "account2", "/path/to/file")));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as what I mentioned somewhere else in this test, could we make sure some of these abfsPath(...) are moved to separate values and can we make sure these tests are strong as possible beyond just comparing if it's the same as another path.

clientByPrefix.values().forEach(PrefixedADLSClient::close);
this.clientByPrefix = null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should keep the newline after }

# Conflicts:
#	azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSFileIO.java
return azureProperties;
}

public String storagePrefix() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we can keep this package private until there's a good reason to make it public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, made it package private.

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 4, 2025
@nastra nastra added not-stale and removed stale labels Aug 4, 2025
@kevinjqliu kevinjqliu self-requested a review August 15, 2025 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants