Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Oct 24, 2024

No description provided.

@nastra nastra requested a review from danielcweeks October 24, 2024 14:15
@github-actions github-actions bot added the AWS label Oct 24, 2024
@nastra nastra force-pushed the s3-refresh-vended-credentials branch from 1e200fd to 5e51c5b Compare October 24, 2024 14:28
import software.amazon.awssdk.utils.cache.CachedSupplier;
import software.amazon.awssdk.utils.cache.RefreshResult;

public class VendedCredentialsProvider implements AwsCredentialsProvider, SdkAutoCloseable {
Copy link
Contributor Author

@nastra nastra Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackye1995 is this credential provider similar to the one you mentioned a while ago where you guys had a custom credential provider that would always call loadTable()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this looks similar.

@nastra nastra force-pushed the s3-refresh-vended-credentials branch from 5e51c5b to 51424d1 Compare October 24, 2024 14:39
@nastra nastra requested a review from jackye1995 October 24, 2024 14:41
danielcweeks
danielcweeks previously approved these changes Oct 24, 2024
Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, but +1

Comment on lines 106 to 108
Optional<Credential> credentialWithPrefix =
s3Credentials.stream().max(Comparator.comparingInt(c -> c.prefix().length()));
Copy link
Contributor

@singhpk234 singhpk234 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[doubt] how will this work for credentials which are expanding different prefixes :
for ex :
s3//abc/123
s3//xyz/1

and the request is for s3://xyz ?

Never the less how will this work for cases with same prefixes :

  1. s3://abc/prefix-1/
  2. s3://abc/prefix-123/
    we will use prefix-123 here instead of prefix-1 ? for the calls length is not the correct way to find the credentials imho

please let me know your thoughts.

Copy link
Contributor

@danielcweeks danielcweeks Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a good callout and I think an oversight from what we're trying to define in the spec. Here we are taking the most selective prefix, but that doesn't necessarily correspond to the prefix we're using the credential against.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to get the behavior we're expecting we would probably need to update S3FileIO::client to take the path as an argument so that the client can be configured/returned with the correct credential. Thoughts @nastra? I assume there is a similar issue on the GCP side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpk234 to answer your specific question, it should fail because there is no prefix in your example that covers the requested path.

Copy link
Contributor

@singhpk234 singhpk234 Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielcweeks got you, This is always been a challenge when using credential provider as if you see it's resolveCredenitials() doesn't take any args so to wire in correct creds for correct paths, which we might not
know at the time of creating S3 client

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the credential to use is the credential with the longest matching prefix for a given requested path. In the case that there's no credential with a matching prefix for a given path, I think we'd want to just fail on the client side; s3 or whatever storage system would fail the request anyways but I think we may as well avoid the request if we know that no credential could cover the requested path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what we rather might want to do going forward is only allow the server to return a single credential per provider. That way it's easier for the client

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, sure that's easier for the client but I'm not sure about making the protocol unnecessarily restrictive. Though I do understand in most cases a single credential per provider is enough, I just want to make sure we're not putting the spec in an awkward spot in case there are legitimate use cases of different credentials per prefix. In general, we've always followed a pattern of avoiding putting unnecessary restrictions on protocol. Also I don't know if it's really that much more complex for clients to resolve the longest matching prefix for a given path

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's reasonable to enforce a single prefix for provider since that was the initial ask. There's an opportunity to improve this, but we can follow up with added support. I don't think it's too difficult, but there are a few edge cases that we'd need to work through. I'm ok proceeding with this implementation.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm good as long as this is just a constraint in the current implementation and we're not requiring/changing any server side expectations in the spec

@danielcweeks danielcweeks dismissed their stale review October 24, 2024 18:07

Need to revisit prefix selection.

@nastra nastra force-pushed the s3-refresh-vended-credentials branch from 51424d1 to 0551ea6 Compare October 25, 2024 07:20
Comment on lines +92 to +91
LoadCredentialsResponse.class,
OAuth2Util.authHeaders(properties.get(OAuth2Properties.TOKEN)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the Token here is same short lived token used by RestCatalog instance, are we planning to handle token refresh post its expiration with in the FileIO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is indeed an issue that I was planning to address when I wrote the first version of this many months ago but simply forgot to get back to. thanks for raising this @ChaladiMohanVamsi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the initial support for refresh, we can assume that we're bound by the token lifetime for refreshes. It's not perfect, but I'd like to see the resolution of some of the AuthManger refactor before settling on a solution. We're not regressing at this point, so I'm ok with leaving this as is and addressing in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is on my todo list for after the AuthManager refactor.

@nastra nastra closed this Oct 25, 2024
@nastra nastra reopened this Oct 25, 2024
Comment on lines 106 to 108
Optional<Credential> credentialWithPrefix =
s3Credentials.stream().max(Comparator.comparingInt(c -> c.prefix().length()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the credential to use is the credential with the longest matching prefix for a given requested path. In the case that there's no credential with a matching prefix for a given path, I think we'd want to just fail on the client side; s3 or whatever storage system would fail the request anyways but I think we may as well avoid the request if we know that no credential could cover the requested path.

@danielcweeks danielcweeks added this to the Iceberg 1.7.0 milestone Oct 25, 2024
@nastra nastra force-pushed the s3-refresh-vended-credentials branch from 0551ea6 to 7ce08bd Compare October 28, 2024 08:27
return client;
}

private LoadCredentialsResponse fetchCredentials() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are 2 concepts that would be beneficial to be added, a staleTime and a prefetchTime. You could check the AWS SDK StsCredentialsProvider for how that is implemented. But this prevents edge cases like the credentials is loaded at almost the expiration time and cause errors downstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already being done further below:

return RefreshResult.builder(
            (AwsCredentials)
                AwsSessionCredentials.builder()
                    .accessKeyId(accessKeyId)
                    .secretAccessKey(secretAccessKey)
                    .sessionToken(sessionToken)
                    .expirationTime(expiresAt)
                    .build())
        .staleTime(expiresAt)
        .prefetchTime(prefetchAt)
        .build();

* <p>When set, the {@link VendedCredentialsProvider} will be used to fetch and refresh vended
* credentials.
*/
public static final String REFRESH_CREDENTIALS_ENDPOINT = "client.refresh-credentials-endpoint";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be a s3 FileIO property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property needs to be set before FileIO is actually being configured and is similar to the client region. VendedCredentialsProvider is configured when AwsClientProperties#credentialsProvider(..) is called, so I don't think this property should be a FileIO property

@nastra nastra removed this from the Iceberg 1.7.0 milestone Oct 29, 2024
@nastra nastra force-pushed the s3-refresh-vended-credentials branch from a770941 to a5b58ed Compare October 29, 2024 16:16
@nastra nastra added this to the Iceberg 1.7.0 milestone Oct 29, 2024
@nastra nastra requested a review from danielcweeks October 29, 2024 16:19
@SuppressWarnings("checkstyle:HiddenField")
public AwsCredentialsProvider credentialsProvider(
String accessKeyId, String secretAccessKey, String sessionToken) {
if (refreshCredentialsEnabled && !Strings.isNullOrEmpty(refreshCredentialsEndpoint)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we log a warning if the endpoint is set but refreshCredentialsEnabled is false? I don't think we should fail but this is probably something a user would want to be aware of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure adding a warning adds a lot of value. It's valid to have the server send you back an endpoint + refresh enabled flag that you then override for cases like Kafka connect. I'd say let's go without a warning for now unless this becomes a place of confusion

Copy link
Member

@RussellSpitzer RussellSpitzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me, I left a few nits but mainly I would like someone who is closer to this part of the code to also approve this PR before we merge.

Copy link
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two future improvements that we've identified (catalog token refresh and multi-prefix support within cloud provider), but this support goes a long way to improving current use cases and even enables new ones, so I'm +1.

Comment on lines +92 to +91
LoadCredentialsResponse.class,
OAuth2Util.authHeaders(properties.get(OAuth2Properties.TOKEN)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the initial support for refresh, we can assume that we're bound by the token lifetime for refreshes. It's not perfect, but I'd like to see the resolution of some of the AuthManger refactor before settling on a solution. We're not regressing at this point, so I'm ok with leaving this as is and addressing in the future.

Comment on lines 106 to 108
Optional<Credential> credentialWithPrefix =
s3Credentials.stream().max(Comparator.comparingInt(c -> c.prefix().length()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's reasonable to enforce a single prefix for provider since that was the initial ask. There's an opportunity to improve this, but we can follow up with added support. I don't think it's too difficult, but there are a few edge cases that we'd need to work through. I'm ok proceeding with this implementation.

@nastra nastra force-pushed the s3-refresh-vended-credentials branch from a5b58ed to 41fe921 Compare October 30, 2024 06:05
@nastra nastra closed this Oct 30, 2024
@nastra nastra reopened this Oct 30, 2024
@nastra nastra merged commit 9e895cb into apache:main Oct 30, 2024
98 checks passed
@nastra nastra deleted the s3-refresh-vended-credentials branch October 30, 2024 09:29
Comment on lines +98 to +101
List<Credential> s3Credentials =
response.credentials().stream()
.filter(c -> c.prefix().startsWith("s3"))
.collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This imho still doesn't addresses the problem of wiring the credential for the right prefix ?
for ex : S3 rest server returned prefix as "s3://bucket/prefix-1"
but the call was for "s3://bucket/prefix-2", unless there is an enforcement from REST "that it will return only longest common prefix in the response"

I would recommend rather than using starting with "S3" let make it equal to "S3" for now to make sure the client doesn't mess around, what do you think ?

@danielcweeks @RussellSpitzer @amogh-jahagirdar @nastra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhpk234 for how the implementation enforces that there's really only a single credential being sent back by the server. I'll be working on supporting and selecting the "right" credential when the server sents back multiple in a follow-up

Copy link
Contributor

@singhpk234 singhpk234 Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for how the implementation enforces that there's really only a single credential being sent back by the server

[doubt] how are we enforcing this ? for ex a rest server can send a credential for only one prefix but still for a diff prefix than what being asked for, are you suggesting that for now, rest server should only send back for exactly "s3" prefix ? if yes how are we enforcing this in rest for the meanwhile ?
Is strategy for now to let if fail ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catalog has the responsibility of returning a credential with a scoped policy that provides the appropriate access for all prefixes. This isn't about the spec or client trying to enforce that behavior. Either the client will have access or not, that's up to the catalog.

Copy link
Contributor

@singhpk234 singhpk234 Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The catalog has the responsibility of returning a credential with a scoped policy that provides the appropriate access for all prefixes

I see, so that fact that only one credential is being returned from the catalog, it itself means that its best prefix that could fit in into all the request the client is allowed make. Make sense ! so we would indeed fail at client trying for an un-acessible prefix from S3 end.

zachdisc pushed a commit to zachdisc/iceberg that referenced this pull request Dec 23, 2024
dimas-b added a commit to dimas-b/polaris that referenced this pull request Oct 17, 2025
`CLIENT_REGION` is not a credential value, which is in line with
Iceberg's `VendedCredentialsProvider` code.

Cf. apache/iceberg#11389
dimas-b added a commit to dimas-b/polaris that referenced this pull request Oct 17, 2025
`CLIENT_REGION` is not a credential value, which is in line with
Iceberg's `VendedCredentialsProvider` code.

Cf. apache/iceberg#11389
HonahX pushed a commit to apache/polaris that referenced this pull request Oct 21, 2025
)

`CLIENT_REGION` is not a credential value, which is in line with
Iceberg's `VendedCredentialsProvider` code.

Cf. apache/iceberg#11389
snazy added a commit to snazy/polaris that referenced this pull request Nov 20, 2025
* Update Quarkus Platform and Group to v3.28.4 (apache#2786)

* Update dependency org.testcontainers:testcontainers-bom to v2.0.1 (apache#2830)

* Build/polaris-core: Remove outdated `constraint`s (apache#2818)

The `:polaris-core` build scripts contains (soft) version-constraints for some dependencies with a vague reason "Vulnerability detected in ..." (concrete CVE/reason not mentioned) referencing specific dependency versions. The mentioned versions are all quite outdated, some are even not transitively referenced. Hence, removing those constraings, as those seem no longer relevant.

Effective dependency versions can be inspected via `./gradlew :polaris-core:dependencies --configuration runtimeClasspath`.

* Add Community Meetings for 2025-10-02 and 2025-10-16 (apache#2832)

* Update docker.io/prom/prometheus Docker tag to v3.7.1 (apache#2834)

* testcontainers v2: tackle deprecation warnings (apache#2835)

* Add findPrincipalById helper (apache#2810)

* Add findPrincipalById helper

this simplifies frequent usage of the lower level `loadEntity` api (similar to the
existing `findPrincipalByName` helper)

* [Python] Add more tests cases for policy CLI (apache#2831)

* Update dependency software.amazon.awssdk:bom to v2.35.10 (apache#2840)

* Update dependency ch.qos.logback:logback-classic to v1.5.20 (apache#2839)

* Reproducible builds: make parent pom content reproducible (apache#2826)

The parent pom contains the `<developer>` and `<contributor>` elements. The former is populated from ASF people information including role information (champion, mentor, chair, (P)PMC member, committer). The latter is retrieved from a GitHub API endpoint, ordered by number contributions. Especially the latter list is prone to vary between builds, which makes the parent pom not reproducible as the locally built one is likely different from the one that was built by the release managed (staged artifact).

This change removes both lists, leaving a single static `<developer>` entry pointing to `https://polaris.apache.org/community/`. Related build-script code has been updated and no longer retrieves people information.

* Log root cause exceptions in mappers (apache#2837)

Fix `IcebergExceptionMapper` and `PolarisExceptionMapper` to pass exceptions as "cause" to the logger (as opposed to unreferenced log parameters).

* Remove credential flag from `StorageAccessProperty.CLIENT_REGION` (apache#2838)

`CLIENT_REGION` is not a credential value, which is in line with
Iceberg's `VendedCredentialsProvider` code.

Cf. apache/iceberg#11389

* CI: Let all workflows use GitHub's docker.io mirror (apache#2841)

* Correct template rendering for authentication options (apache#2808)

* Correct template rendering for authentication options

* Added tpl back

* Increase javadoc visibility in `:polaris-async-vertx` (apache#2745)

This is to fix javadoc error: `No public or protected classes found to document`

* Update slack invite url (apache#2846)

* Remove unused ConcurrentLinkedQueueWithApproximateSize (apache#2849)

* Merge AwsCloudWatchConfiguration and QuarkusAwsCloudWatchConfiguration (apache#2848)

For some reason, these two classes weren't properly merged when the runtime-service and service-common modules were merged. This PR fixes that.

This PR also adds some examples of AWS Cloud Watch configuration to the default application.properties file.

* Move TestPolarisEventListener to test fixtures (apache#2850)

* Update dependency com.google.cloud:google-cloud-storage-bom to v2.59.0 (apache#2857)

* Update actions/stale digest to e46bbab (apache#2856)

* Servcie: Remove a duplicated config (apache#2854)

* Update docker.io/prom/prometheus Docker tag to v3.7.2 (apache#2858)

* Update Quarkus Platform and Group to v3.28.5 (apache#2859)

* Update dependency com.google.errorprone:error_prone_core to v2.43.0 (apache#2860)

* Add --no-sts to CLI (apache#2855)

* Add --no-sts to CLI

Following up on apache#2672, add new `--no-sts` option to CLI to allow
configuring `stsUnavailable` in `AwsStorageConfigInfo`

* Use AccessConfigProvider.getAccessConfig in DefaultFileIOFactory (apache#2852)

* CLI: Remove the trailing comma (apache#2863)

* Update dependency pip-licenses-cli to v3 (apache#2842)

* Update dependency pip-licenses-cli to v3

* Update pip-licenses-cli version format

* Fix pip-licenses-cli version specification

---------

Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>

* Update quay.io/keycloak/keycloak Docker tag to v26.4.2 (apache#2868)

* Bump main to 1.3.0-SNAPSHOT (apache#2870)

* Add properties from TableMetadata into Table entity internalProperties (apache#2735)

* Add properties from TableMetadata into Table entity internalProperties

* Made table properties constants and pulled out static utility method

* Update dependency io.smallrye:jandex to v3.5.1 (apache#2872)

* Fix exec flags on getting-started scripts (apache#2878)

* Add `+x` to script source files
* Remove (unnecessary) `chmod` from docs

* Update plugin jcstress to v0.9.0 (apache#2882)

* Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.23-6.1761164966 (apache#2874)

* Update dependency openapi-generator-cli to v7.16.0 (apache#2703)

* Update Gradle to v9 (apache#2226)

* Update Gradle to v9

* adopt gradlew

---------

Co-authored-by: Robert Stupp <snazy@snazy.de>

* Last merged commit 7892540

---------

Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: JB Onofré <jbonofre@apache.org>
Co-authored-by: Christopher Lambert <xn137@gmx.de>
Co-authored-by: Nuoya Jiang <98131931+NuoyaJiang@users.noreply.github.com>
Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com>
Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>
Co-authored-by: Honah (Jonas) J. <honahx@apache.org>
Co-authored-by: Alexandre Dutra <adutra@apache.org>
Co-authored-by: Yufei Gu <yufei@apache.org>
Co-authored-by: Nuoya Jiang <98131931+CodingBangboo@users.noreply.github.com>
Co-authored-by: Michael Collado <40346148+collado-mike@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants