-
Notifications
You must be signed in to change notification settings - Fork 332
Support IMPLICIT authentication type for federated catalogs #1925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it different from NULL_TYPE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With some connection types like HIVE, you may either have or not have an authentication mechanism in place. It's reasonable that admins might want to limit which authentication types can be used for their service (e.g. disallow auth-less federation), but they can't disallow having a null authentication type because that is also used by "classic" EXTERNAL catalogs that don't federate at all.
In other words, if you have a non-null connection, you should have a non-null authentication type configured. We should still leave room however for users to be explicit about certain federation modes like HIVE that may not have authentication in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need to distinguish defined auth from undefined auth (i.e. null). However, a null value is not the same as a NULL_TYPE value. This is probably a concern about the existing codebase rather than this particular PR 🤔 but I did not find any usages of NULL_TYPE in the current codebase.
So, I wonder whether introducing the NO_AUTH enum value is really necessary (as opposed to renaming NULL_TYPE). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding AuthenticationType (and ConnectionType) should keep the NULL_TYPE to indicate an unsupported/missing value. Specifically to handle the case pointed out in the discussion that follows. If a newer version of AuthenticationType is used to create a federated catalog and the type is persisted in the ConnectionConfigInfoDPO. Later, an older client tries to read this DPO, it will return NULL_TYPE; which is an appropriate response considering it read an unknown/missing type from the DPO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question the usefulness of NULL_TYPE to represent an "unsupported" value. ATM, I do not see any existing usage of NULL_TYPE in Polaris code.
As for REST API clients, they do not use this enum at all, as far as I understand. This enum appears to used by Polaris server code only internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While NULL_TYPE is not explicitly used in the codebase, it is implied (and therefore used) in the default case of switch statements. I can go over a more explicit case for ConnectionType here:
- Assume that sometime in the future, Polaris adds support for
HIVEconnections. - The user then creates an external catalog called
hive-catalogof typeHIVE. At which point, Polaris will store the corresponding ConnectionConfigInfoDpo into the underlying persistence layer. While creating the persistence object, the constructor similar toIcebergRestConnectionConfigInfoDpo, calledHiveConnectionConfigInfoDpowill create this object with enums defined internally by the Polaris server. It'd make a call of the formConnectionType.HIVE.getCode(). - Further, let's assume that one of the two happens (1) we rollback the hive change the newer versions of Polaris don't support Hive connections, or (2) the Polaris instance is rebooted with an older version. In both cases, the Polaris instance won't recognize the
HIVEconnection type. - Lastly, let's say the user wants to perform some operation on the
hive-catalog. TheIcebergCatalogHandler::initializeCatalog()would now read the persisted object and parse it's information, specifically, let's look at this snippet:
ConnectionType.fromCode(connectionConfigInfoDpo.getConnectionTypeCode());
switch (connectionType) {
case ICEBERG_REST:
SessionCatalog.SessionContext context = SessionCatalog.SessionContext.createEmpty();
federatedCatalog =
new RESTCatalog(
context,
(config) ->
HTTPClient.builder(config)
.uri(config.get(org.apache.iceberg.CatalogProperties.URI))
.build());
federatedCatalog.initialize(
((IcebergRestConnectionConfigInfoDpo) connectionConfigInfoDpo).getRemoteCatalogName(),
connectionConfigInfoDpo.asIcebergCatalogProperties(getUserSecretsManager()));
break;
case HADOOP:
federatedCatalog = new HadoopCatalog();
federatedCatalog.initialize(
((HadoopConnectionConfigInfoDpo) connectionConfigInfoDpo).getWarehouse(),
connectionConfigInfoDpo.asIcebergCatalogProperties(getUserSecretsManager()));
break;
default:
throw new UnsupportedOperationException(
"Connection type not supported: " + connectionType);
}
Since HIVE is not a valid connection type in this Polaris instance, the ConnectionType::fromCode() function will return NULL_TYPE and hence activate the default case.
This was my understanding of why we need a NULL_TYPE. Essentially, since we always want to support not just backward compatibility but also forward compatibility wherein the data store might contain some objects/types that the current Polaris instance does not recognize. The NULL_TYPE provide a nice catch-all type to allow appropriate error handling.
spec/polaris-management-service.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beside that this is a breaking change, any user facing API change requires a discussion on the dev mailing list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to dev list discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a breaking change btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an enum in older clients - and Java enum valueOf throws and does not gracefully handle unknown values. Hence it is a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the semver sense, I'm not sure that needs to qualify as a breaking change -- at least it hasn't been called out as such when we've made similar changes in the past. The REST API is not breaking, even if some clients implemented in a fragile way might. Our Python client seems to fail quite gracefully. See #1506 and the associated doc here for a similar change made recently.
Separately, if you think it's a good idea to harden our Java client against this kind of enum change, I think that would be a great contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, if you think it's a good idea to harden our Java client against this kind of enum change, I think that would be a great contribution.
I guess that it would be a hard fight to remove the code generator, mind proposing that change to get that implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, is the Java client already set up to treat this as NULL_TYPE?
if (authTypeCode < 0 || authTypeCode >= REVERSE_MAPPING_ARRAY.length) {
return NULL_TYPE;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the other way around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what do you mean by the other way around? Does this relate to the doc linked above?
edit: Oh, are you referring to what happens if a client tries to provide this new enum value to an older version of the service? How is that different from a client calling an API that doesn't exist on the service? I don't think that typically qualifies as a breaking change under semver
Regardless of whether or not it is a breaking change however, we do need to discuss the spec change on the ML just like the linked PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/connection/AuthenticationParametersDpo.java
Outdated
Show resolved
Hide resolved
|
Thanks everyone for looking into this PR, I have started a devlist discussion as suggested. |
service/common/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java
Outdated
Show resolved
Hide resolved
ad8b9bb to
bb9b891
Compare
|
Hi everyone, @dennishuo suggested that instead of calling the this @eric-maynard @dimas-b What do you think? |
|
I like the idea, but I think term |
This part isn't accurate. The Polaris service does not perform any pre-arranged authentication. It relies on the underlying connection library to carry out the authentication flow based on the environment or configuration file. Really what we want to convey here is that:
To elaborate on 2: If the config file provides no authentication parameters, then it will use an unauthenticated mechanism, on the other hand, let's say the config file contains auth method Kerberos and provides a path to a key-tab file, then the connection library will connect to the kerberized hadoop instance. In both cases Polaris is oblivious to what happens under the hood. |
|
From the user's perspective the term |
|
I agree that the user does not control the |
|
@poojanilangekar : could you post the enum name options on the dev ML thread for visibility? This is an API change, so extra care is necessary. |
|
Hi everyone, just a gentle reminder: can you please respond to the mailing list with your naming preference? |
|
I've updated the PR by renaming the AuthenticationType to IMPLICIT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polaris assumes any catalog that is not defined as "INTERNAL" is by default "EXTERNAL", it ends up activating the code-path in PolarisAdminService.java:733
Granted that since we allow legacy EXTERNAL catalogs without ConnectionConfigInfos, the test would still pass without the change. But I think that wasn't the intention of the test.
For this test and other tests in catalog/, I think it's ideal to not activate that code path in the first place. If you want me to I can revert this and other changes. Please let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so much worried about this test code change, but about defaulting to EXTERNAL. Existing test code appears to assume that the default is/was INTERNAL (which makes sense to me).
@dennishuo @eric-maynard : WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fix is simple, I can create a separate PR to fix it.
polaris/polaris-core/src/main/java/org/apache/polaris/core/entity/CatalogEntity.java
Line 109 in 9cb71f8
| return catalogType == Catalog.TypeEnum.INTERNAL |
The builder checks if the catalogType is INTERNAL, if not, it assumes that the type is EXTERNAL, we can flip the condition to check if the type is set to EXTEERNAL. I also think that the change only affects tests, because in other cases, it's generated from polaris-management-service.yml which defaults to INTERNAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening a PR for that would be great. Thx!
Let's review the impact of that change on that specific PR (I hope it's ok).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I opened #1998, I reverted this and other similar changes. The build still passes but I think we should try to get 1998 merged before this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx! Let's merge #1998 first and rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
service/common/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java
Outdated
Show resolved
Hide resolved
68adaaf to
abf3a4b
Compare
abf3a4b to
455b8b3
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall with one more minor comment.
| - OAUTH | ||
| - BEARER | ||
| - SIGV4 | ||
| - IMPLICIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a brief note to the New Features section in CHANGELOG.md about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Ignore regenerate.sh on README.md (apache#1999) * OpenAPI-generate: Omit generation timestamp (apache#2004) The jaxrs-resteasy OpenAPI generator adds the generation timestamp to the generated sources by default. This behavior leads to different code for every generation, leading to unnecessary rebuilds (and re-tests), because the generated `.class` files are different. * Update CatalogEntity::Builder to set default CatalogType as INTERNAL (apache#1998) Encountered the issue while adding additional validations to `ExternalCatalog`. The `CatalogEntity::Builder` checks if the `Catalog::type` is set to `INTERNAL`, if not it defaults to `EXTERNAL`. However this is the opposite of the behavior defined in polaris-management-service.yml where the default is set to `INTERNAL`. This change only affects tests because in other cases the catalog entity is generated from the REST request. Testing: Updated CatalogEntityTest to ensure that the default is set to `INTERNAL`. * Support IMPLICIT authentication type for federated catalogs (apache#1925) Previously, the ConnectionConfigInfo required explicit AuthenticationParameters for every federated catalog. However, certain catalogs types that Polaris federates to (either now or in the future) allow `IMPLICIT` authentication, wherein the authentication parameters are picked from the environment or configuration files. This change enables federating to such catalogs without passing dummy secrets. The `IMPLICIT` option is guarded by the `SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES`. Hence users may create federated catalogs with `IMPLICIT` authentication only when the administrator explicitly enables this feature. * Fix helm doc (apache#2001) * Fix helm doc * Remove persistent ref * Remove persistent ref * Fixes based on feedback * Fixes based on feedback * Fixes based on feedback * Fixes based on feedback * feat(auth): Ability to override active roles provider per realm (apache#2000) * feat(auth): Ability to override active roles provider per realm * deprecate old property * add tests * Introduce an option to add object storage prefix to table locations (apache#1966) ### Problem Currently, Polaris enforces that the physical layout of entities maps to the logical layout: ``` catalog └── ns1 ├── ns2 │ └── table_b └── table_a ``` In the above example, the base locations of `table_a` and `ns2` are expected to be children of `ns1`, and the location of `table_b` is expected to be a child of `ns2`. This behavior is controlled by `ALLOW_UNSTRUCTURED_TABLE_LOCATION` and is the basis for the sibling overlap check when `OPTIMIZED_SIBLING_CHECK` is disabled or persistence cannot support the optimized check. However, some users have reported that this physical organization of data can lead to undesirable performance characteristics when hotspotting occurs across namespaces. If the underlying storage is range partitioned by key, this organization will tend to physically collocate logically-similar entities. ### Solution To solve this problem, this PR introduces a new option `DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED` which alters the behavior of the catalog when creating a table without a user-specified location. With the feature disabled, a table such as `ns1.table_a` will have a path like this: ``` s3://catalog/base/ns1/table_a/ ``` With the feature enabled, a prefix is added before the namespace: ``` s3://catalog/base/0010/0101/0110/10010100/ns1/table_a/ ``` This serves to eliminate the physical collocation of tables in the same namespace (or with similarly-named namespaces or table names). This functionality is similar to Iceberg's `write.object-storage.enabled`, but it applies across tables and namespaces. The two features can and should be combined to achieve the best distribution of data files throughout the key space. ### Configuration & Sibling Overlap Check If an admin doesn't care about the risk of vending credentials with the sibling overlap check disabled, they can enable the feature with these configs: ``` polaris.features.DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED=true polaris.features.ALLOW_UNSTRUCTURED_TABLE_LOCATION=true polaris.features.ALLOW_TABLE_LOCATION_OVERLAP=true polaris.behavior-changes.VALIDATE_VIEW_LOCATION_OVERLAP=false ``` In order to use this feature and to preserve the sibling overlap check, you can configure the service with: ``` polaris.features.DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED=true polaris.features.ALLOW_UNSTRUCTURED_TABLE_LOCATION=true polaris.features.OPTIMIZED_SIBLING_CHECK=true ``` However, note that the `OPTIMIZED_SIBLING_CHECK` comes with some caveats as outlined in its description. Namely, it currently only works with some persistence implementations and it requires all location-based entities to have a recently-introduced field set. These locations are expected to be suffixed with `/`, and locations with many `/` may not be eligible for the optimized check. Older Polaris deployments may not meet these requirements without a migration or backfill. Accordingly combining these two features should be considered experimental for the time being. * Cleanup collaborators in `.asf.yaml` (apache#2008) Some devs were added in the past to `.asf.yaml` to let CI run w/o committer approval. After [INFRA-26985](https://issues.apache.org/jira/browse/INFRA-26985) this is no longer necessary, so the file can be cleaned up. * Fix bunch of OpenAPI generation issues (apache#2005) The current way how OpenAPI Java code is generated suffers from a bunch of issues: * Changes to any of the source spec files requires a Gradle `clean`, otherwise old generated Java source will remain - i.e. "no longer" existing sources are not removed. This is addressed by adding an additional action to `GenerateTask`. * The output of `GenerateTask` was explicitly not cached, this is removed, so the output is cached. * Add explicit inputs to `GenerateTask` to the whole templates and spec folders. * Restructure the download page (apache#2011) * Add 1.0.0-incubating release to the downloads page (apache#2018) * Add 1.0.0 docs to the huge menu (apache#2020) * Improve the bundle jar license and notice remove using exclude (apache#1991) * Remove duplicate MetaStoreManagerFactory mocks (apache#2023) also rename the field for clarity and consistency * Update Makefile for python client with auto setup (apache#1995) Automate python client setup and use a virtual env instead to avoid change an end-users' OS python * Add Helm Chart repo to the downloads page (apache#2025) * Publish helm doc (apache#2014) * Make PolarisConfiguration member variables private (apache#2007) * Make PolarisConfiguration members private * Make methods final * Use the 0.9.0 doc from the versioned-docs branch (apache#2026) * Helm key grouping and test cases (apache#2002) * Helm key grouping and test cases * Update README.md * Added backwards compatible * Fix conflict * Use coalesce instead of if else * Remove kind (apache#2028) * Remove kind * Remove k8 dir from check-md-link.yml * Sync helm doc (apache#2034) * Update release-guide.md for publishing docs (apache#2035) * [Site] Simplify the doc directory structure (apache#2033) * [Site] Update release-guide.md for release dir name (apache#2037) * Fix gralde command for helm image and remove simple-values.yaml (apache#2036) * Using the closer.lua download script (apache#2038) * Fix the LICENSE and NOTICE with the latest dependency updates (apache#1939) * Fix invalid redirect from public page (apache#2041) * Make StorageCredentialCache safe for mutli-realm usage (apache#2021) Injecting the request-scoped `RealmContext` into the application-scoped `StorageCredentialCache` makes things unnecessarily complicated. Similarly `StorageCredentialCacheKey` having a `@Nullable callContext` makes it more difficult to reason about. Instead we can determine all realm-specific values at the time of insertion (from the `PolarisCallContext` param of `getOrGenerateSubScopeCreds`). * feat(ci): Improve Gradle cache in CI (apache#1928) * Introduce RealmConfig (apache#2015) Getting a config value currently requires quite a ceremony: ``` ctx.getPolarisCallContext() .getConfigurationStore() .getConfiguration(ctx.getRealmContext(), "ALLOW_WILDCARD_LOCATION", false)) ``` since a `PolarisConfigurationStore` cant be used without a `RealmContext` it makes sense to add a dedicated interface. this allows removal of verbose code and also moves towards injecting that interface via CDI at a request/realm scope in the future. * Fix CI (apache#2043) The `store-gradle-cache` job in the `gradle.yaml` GitHub workflow is missing a "checkout", this change adds it to fix CI. * Fix CI (no 2) (apache#2044) The newly added `store-gradle-cache` CI job has run some Gradle task to trigger Gradle's automatic cache cleanup. In the source project Nessie we used a simple task `showVersion` to do this. As having this task in Polaris might be useful, adding this task as there's no other suitable task (cheap and not generating much output) seems legit. * Bump Quarkus version to unblock IntelliJ build (apache#1958) Use Quarkus 3.24.3 to fix build issues with `:polaris-server:classes` * Use application-scoped StorageCredentialCache (apache#2022) Since `StorageCredentialCache` is application scoped and after 6ddd148 its constructor no longer uses the `RealmContext` passed into `getOrCreateStorageCredentialCache` we can now let all `PolarisEntityManager` instances share the same `StorageCredentialCache`. * Attempt to make Renovate work again (apache#2052) Looks that I accidentally broke Renovate with apache#1891. This was made under the impression of the [Renovate change to support `baseBranches` in forking-renovate] (renovatebot/renovate#36054). However, a [later Renovate change](renovatebot/renovate#35579) seems to break that. The plan here is to: 1. remove the regex from our `baseBranches` option - if that doesn't work then 2. just use the default branch * main: Update actions/stale digest to 128b2c8 (apache#2053) * main: Update dependency com.azure:azure-sdk-bom to v1.2.36 (apache#2054) * main: Update dependency com.fasterxml.jackson:jackson-bom to v2.19.1 (apache#2055) * main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.53.3 (apache#2057) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1752066187 (apache#2059) * main: Update dependency com.github.ben-manes.caffeine:caffeine to v3.2.2 (apache#2056) * main: Update dependency gradle to v8.14.3 (main) (apache#2058) * main: Update dependency gradle to v8.14.3 * Adjust Gradle update --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * main: Update dependency io.micrometer:micrometer-bom to v1.15.2 (apache#2063) * main: Update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.1.0 (apache#2067) * main: Update dependency com.nimbusds:nimbus-jose-jwt to v10.3.1 (apache#2062) * main: Update docker.io/prom/prometheus Docker tag to v3.5.0 (apache#2071) * main: Update dependency org.junit:junit-bom to v5.13.3 (apache#2064) * main: Update docker.io/jaegertracing/all-in-one Docker tag to v1.71.0 (apache#2070) * main: Update medyagh/setup-minikube action to v0.0.20 (apache#2066) * main: Update dependency org.apache.commons:commons-lang3 to v3.18.0 (apache#2069) * main: Update log4j2 monorepo to v2.25.1 (apache#2073) * main: Update immutables to v2.11.0 (apache#2072) * main: Update dependency org.testcontainers:testcontainers-bom to v1.21.3 (apache#2065) * main: Update dependency com.google.errorprone:error_prone_core to v2.40.0 (apache#2068) * main: Update dependency io.netty:netty-codec-http2 to v4.2.3.Final (apache#2074) * main: Update dependency io.prometheus:prometheus-metrics-exporter-servlet-jakarta to v1.3.10 (apache#2076) * main: Update dependency net.ltgt.gradle:gradle-errorprone-plugin to v4.3.0 (apache#2079) * main: Update dependency io.projectreactor.netty:reactor-netty-http to v1.2.8 (apache#2075) * main: Update dependency com.gradleup.shadow:shadow-gradle-plugin to v8.3.8 (apache#2061) * main: Update dependency org.eclipse.persistence:eclipselink to v4.0.7 (apache#2078) * Add External Identity Providers page to unreleased documentation (apache#2013) --------- Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Eric Maynard <emaynard@apache.org> * main: Update dependency io.opentelemetry:opentelemetry-bom to v1.52.0 (apache#2082) * main: Update dependency software.amazon.awssdk:bom to v2.31.78 (apache#2080) * main: Update dependency com.adobe.testing:s3mock-testcontainers to v4.6.0 (apache#2081) * main: Update dependency io.smallrye.common:smallrye-common-annotation to v2.13.7 (apache#2083) * Revert PR 2033 (apache#2087) The PR apache#2033 was merged within less than 3 hours, late on a Friday. Since that change does not address an issue that seriously deserves a quick reaction nor is it a "nit", I'm proposing to revert the change. We do have [community best practices](https://polaris.apache.org/community/contributing-guidelines/) stating to give the whole community enough time to review, which did not happen. There are concerns that the PR apache#2033 will interfere with the whole effort to automate releases. Since there was no change to review and raise the concerns, I'd like to revert it to not cause any friction with that bigger effort. Revert "Fix invalid redirect from public page (apache#2041)", commit 493bc8e. Revert "[Site] Simplify the doc directory structure (apache#2033)", commit 2db2f10. * Renovate PRs, branch name + PR subject (apache#2060) Until June, Renovate PRs behaved a little bit different than today. The difference is the branch name. Before it was something like `renovate-bot/renovate/main/org.openapi.generator-7.x`, now it's like `renovate-bot/renovate/main-main/actions-stale-digest` (branch name is duplicated). I also noticed that the branch name is repeated in the PR subject, which started to be that way some longer ago. This change removes both duplications. * Simplify RealmEntityManagerFactory usage in tests (apache#2050) since all ctor params are created in `IcebergCatalogTest.before` we can do the same for `RealmEntityManagerFactory` `PolarisAuthzTestBase.entityManager` is already getting derived from `realmEntityManagerFactory`: https://github.com/apache/polaris/blob/2c2052c28f899aaa85e5f11a9131d9812ec62679/runtime/service/src/test/java/org/apache/polaris/service/quarkus/admin/PolarisAuthzTestBase.java#L247 * Use PolarisImmutable for StorageCredentialCacheKey (apache#2029) * remove unused entityId from StorageCredentialCacheKey * convert StorageCredentialCacheKey to immutables * Disable renovatebot on release branches (apache#2085) Per the mailing list thread "[DISCUSS] Disable renovatebot on release branches", we should not do automatic dependency upgrades for release branches. Since it seems `release/1.0.x` is a release branch, we can remove this regex from renovate's list. * Site: Remove non-OSS query engines from front page (apache#2031) * update query engines list * Add Dremio OSS * fix(deps): update immutables to v2.11.1 (apache#2113) * fix(deps): update dependency boto3 to v1.39.4 (apache#2116) * chore: Avoid deprecated `DefaultCredentialsProvider.create()` (apache#2119) Use `DefaultCredentialsProvider.builder().build()` as suggested by AWS SDK javadoc. * fix(deps): update dependency boto3 to v1.39.6 (apache#2120) * Extensible pagination token implementation (apache#1938) Based on apache#1838, following up on apache#1555 * Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations. * Extensible to (later) support (cryptographic) signatures to prevent tampered page-token * Refactor pagination code to delineate API-level page tokens and internal "pointers to data" * Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size. * Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil` * `PageToken` subclasses are no longer necessary. * Serialzation of `PageToken` uses Jackson serialization (smile format) Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed. Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> * Site/dev: allow overriding the podman/docker binaries detection (apache#2051) The scripts in the `bin/` directory are built to work with both Docker and podman. There are nuances in how both behave, especially wrt docker/podman-compose. Some local environment specifics require the use of `podman-compose`, others the use of `docker-compose`. The default behavior is to prefer the `podman` and `podman-compose` binaries, if those exist and fall back to `docker` and `docker-compose`. Some setups using podman however require the use of `docker-compose` even if `podman-compose` is installed. This may manifest in an error message stating that `--userns` and `--pod` cannot be used together. In that case create a file `.user-settings` in the `site/` folder and add these two lines: ```bash DOCKER=docker COMPOSE=docker-compose ``` * NoSQL: build descriptions * NoSQL: README nits * NoSQL: Misc ports * Pagination * Policy fixes * Adoptions to "conflicting" changes * runtime-service test abstractions * Last merged commit d2667e5 --------- Co-authored-by: Yong Zheng <yongzheng0809@gmail.com> Co-authored-by: Pooja Nilangekar <poojan@umd.edu> Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> Co-authored-by: Yufei Gu <yufei@apache.org> Co-authored-by: Yun Zou <yunzou.colostate@gmail.com> Co-authored-by: Christopher Lambert <xn137@gmx.de> Co-authored-by: Dongjoon Hyun <dongjoon@apache.org> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Adnan Hemani <adnan.h@berkeley.edu> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Mark Hoerth <47870294+markhoerth@users.noreply.github.com> Co-authored-by: Eric Maynard <emaynard@apache.org> Co-authored-by: Danica Fine <danica.fine@gmail.com> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Honah (Jonas) J. <honahx@apache.org>
* chore(deps): update dependency mypy to >=1.17, <=1.17.0 (apache#2114) * Spark 3.5.6 and Iceberg 1.9.1 (apache#1960) * Spark 3.5.6 and Iceberg 1.9.1 * Cleanup * Add `pathStyleAccess` to AwsStorageConfigInfo (apache#2012) * Add `pathStyleAccess` to AwsStorageConfigInfo This change allows configuring the "path-style" access mode in S3 clients (both in Polaris Servers and Iceberg REST Catalog API clients). This change is applicable both to AWS storage and to non-AWS S3-compatible storage (apache#1530). * Add TestFileIOFactory helper (apache#2105) * Add FileIOFactory.wrapExisting helper * fix(deps): update dependency gradle.plugin.org.jetbrains.gradle.plugin.idea-ext:gradle-idea-ext to v1.2 (apache#2125) * fix(deps): update dependency boto3 to v1.39.7 (apache#2124) * Abstract polaris-runtime-service tests for all persistence implementations (apache#2106) The NoSQL persistence implementation has to run the Iceberg table & view catalog plus the Polaris specific tests as well. Reusing existing tests is beneficial to avoid a lot of code duplcation. This change moves the actual tests to `Abstract*` classes and refactors the existing tests to extend those. The NoSQL persistence work extends the same `Abstract*` classes but runs with different Quarkus test profiles. * Add IMPLICIT authentication support to the CLI (apache#2121) PRs apache#1925 and apache#1912 were merged around the same time. This PR connects the two changes and enables the CLI to accept IMPLICIT authentication type. Since Hadoop federated catalogs rely purely on IMPLICIT authentication, the CLI parsing test has been updated to reflect the same. * feat(helm): Add support for external authentication (apache#2104) * fix(deps): update dependency org.apache.iceberg:iceberg-bom to v1.9.2 (apache#2126) * fix(deps): update quarkus platform and group to v3.24.4 (apache#2128) * fix(deps): update dependency boto3 to v1.39.8 (apache#2129) * fix(deps): update dependency io.smallrye.config:smallrye-config-core to v3.13.3 (apache#2130) * Add newIcebergCatalog helper (apache#2134) creation of `IcebergCatalog` instances was quite redundant as tests mostly use the same parameters most of the time. also remove an unused field in 2 other tests. * Add server and client support for the new generic table `baseLocation` field (apache#2122) * Use Makefile to simplify setup and commands (apache#2027) * Use Makefile to simplify setup and commands * Add targets for minikube state management * Add podman support and spark plugin build * Add version target * Update README.md for Makefile usage and relation to the project * Fix nit * Package polaris client as python package (apache#2049) * Package polaris client as python package * Package polaris client as python package * Change owner to spark when copying files from local into Dockerfile * CI: Address failure from accessing GH API (apache#2132) CI sometimes fails with this failure: ``` * What went wrong: Execution failed for task ':generatePomFileForMavenPublication'. > Unable to process url: https://api.github.com/repos/apache/polaris/contributors?per_page=1000 ``` The sometimes failing request fetches the list of contributors to be published in the "root" POM. Unauthorized GH API requests have an hourly(?) limit of 60 requests per source IP. Authorized requests have a much higher rate limit. We do have a GitHub token available in every CI run, which can be used in GH API requests. This change adds the `Authorization` header for the failing GH API request to leverage the higher rate limit and let CI not fail (that often). * fix(deps): update dependency com.nimbusds:nimbus-jose-jwt to v10.4 (apache#2139) * fix(deps): update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.2.0 (apache#2142) * fix(deps): update dependency software.amazon.awssdk:bom to v2.32.4 (apache#2146) * fix(deps): update dependency org.xerial.snappy:snappy-java to v1.1.10.8 (apache#2138) * fix(deps): update dependency org.junit:junit-bom to v5.13.4 (apache#2147) * fix(deps): update dependency boto3 to v1.39.9 (apache#2137) * fix(deps): update dependency com.fasterxml.jackson:jackson-bom to v2.19.2 (apache#2136) * Python client: add support for endpoint, sts-endpoint, path-style-access (apache#2127) This change adds support for endpoint, sts-endpoint, path-style-access to the Polaris Python client. Amends apache#1913 and apache#2012 * Remove PolarisEntityManager.getCredentialCache (apache#2133) `PolarisEntityManager` itself is not using the `StorageCredentialCache` but just hands it out via `getCredentialCache`. the only caller of `getCredentialCache` is `FileIOUtil.refreshAccessConfig`, which in in turn is only called by `DefaultFileIOFactory` and `IcebergCatalog`. note that in a follow-up we will likely be able to remove `PolarisEntityManager` usage completely from `IcebergCatalog`. additional cleanups: - use `StorageCredentialCache` injection in tests (but we need to invalidate all entries on test start) - remove unused `UserSecretsManagerFactory` from `PolarisCallContextCatalogFactory` * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.22-1.1752676419 (apache#2150) * fix(deps): update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.2.1 (apache#2152) * fix(deps): update dependency boto3 to v1.39.10 (apache#2151) * chore: fix class reference in the javadoc of TableLikeEntity (apache#2157) * fix(deps): update dependency commons-codec:commons-codec to v1.19.0 (apache#2160) * fix(deps): update dependency boto3 to v1.39.11 (apache#2159) * Last merged commit 395459f --------- Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Yong Zheng <yongzheng0809@gmail.com> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Christopher Lambert <xn137@gmx.de> Co-authored-by: Pooja Nilangekar <poojan@umd.edu> Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Yun Zou <yunzou.colostate@gmail.com>
Previously, the ConnectionConfigInfo required explicit AuthenticationParameters for every federated catalog. However, certain catalogs types that Polaris federates to (either now or in the future) allow
IMPLICITauthentication, wherein the authentication parameters are picked from the environment or configuration files. This change enables federating to such catalogs without passing dummy secrets.The
IMPLICIToption is guarded by theSUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES. Hence users may create federated catalogs withIMPLICITauthentication only when the administrator explicitly enables this feature.