Skip to content

Conversation

@gh-yzou
Copy link
Contributor

@gh-yzou gh-yzou commented Jul 2, 2025

Previously we added addLicenseFilesToJar job to help removing the LICENSE and NOTICE from dependency jars and also add our own Jars.
The addLicenseFilesToJar does this brute forcely, this PR improves the process by using exclude, and we are also able to remove this extra addLicenseFilesToJar job.

@snazy
Copy link
Member

snazy commented Jul 2, 2025

We were not able to reuse the current shadowJar plugin for maven publish due to the shadowJar plugin replaces the regular source Jar with the shadowJar during maven publish.

I doubt this statement is true. The shadow plugin does not change anything wrt to sourceJar.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this refactoring @gh-yzou ! The new build script looks much better to me... I still have a couple of comments :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why not use polaris as the example catalog name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the shell command without filling the actual value, i added an example in the README with specific value filled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CUSTOM is only to avoid falling under the exclude rules above, how about naming it BUNDLE-LICENSE instead? I think that name is clearer.

Copy link
Contributor

@dimas-b dimas-b Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively (because the simple LICENSE file name feel more appropriate to me), how about the following?

  1. Copy ${projectDir}/LICENSE" to build/license/BUNDLE-LICENSE
  2. Include build/license/BUNDLE-LICENSE with rename inside the shadow task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative seems unnecessary complicated to me, I think name with BUNDLE-* is a perfect name, which indicates the license specifically for the bundle jar, which is different than the regular LICENSE and NOTICE we use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbonofre : WDYT about BUNDLE-LICENSE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gh-yzou : I edited by comment about copy at build time. I hope the intent is clearer now. Sorry about the confusion.

Copy link
Contributor Author

@gh-yzou gh-yzou Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think i got your original comment, i don't think we need to do those extra copies, which seems unnecessary, and i think BUNDLE-LICENSE and BUNDLE-NOTICE is a good name there, we can see what @jbonofre think about the original file name, just note that in the final jar, the name will still be LICENSE and NOTICE, that is just the file name in the original project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These LICENSE files are intended only for binary distributions. In the source code, using the name BUNDLE-LICENSE is perfectly fine, as long as we ensure the correct file name is used in the final published binary artifacts. Let's make sure the naming is adjusted appropriately during the packaging step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b I reverted the extra module change in this PR, I think we can investigate into how to reuse the shadowJar plugin in a followup PR. And I think @jbonofre is ok with the current license name, since it is not a user facing name, I think we can always improve on top of it later. could you please take one more look of this PR?

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jul 2, 2025

@snazy it doesn't change the sourceJar, however, it updates the generated .pom file to only dependent on the bundle jar, and it stop publishes the original sourceJar, where we only got the following jars

repo .m2/repository/org/apache/polaris/polaris-spark-3.5_2.12/1.1.0-incubating-SNAPSHOT

maven-metadata-local.xml
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.module
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar

Therefore, we wasn't able to directly reuse the shadow jar. However, we are able to reuse it now, and with a separate bundle project, it also make things much more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised a bit that this works based on the documentation for the exclude which I thought was the last thing that is run after the steps like this would execute but I'm glad that it does

Copy link
Contributor Author

@gh-yzou gh-yzou Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I need to make sure the original file name doesn't match any of the matching rules to avoid being excluded, the rename actually happens after the excluding. I have verified manually, the LICENSE and NOTICE is the only one I see at top level, and they are the right one

@gh-yzou gh-yzou requested a review from jbonofre July 3, 2025 17:05
@snazy
Copy link
Member

snazy commented Jul 4, 2025

@snazy it doesn't change the sourceJar, however, it updates the generated .pom file to only dependent on the bundle jar, and it stop publishes the original sourceJar, where we only got the following jars
...

I still don't understand what you're trying to say here. In the PR description you say: the shadowJar plugin replaces the regular source Jar with the shadowJar, in the comment you say stop publishes the original sourceJar, but the listing below shows she source jar.

There's nothing that stops users of the shadow plugin to publish both the "base" jar (e.g. using a different classifier) and the shaded jar from a single project.

Copy link
Contributor

@dimas-b dimas-b Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this PR the bundled artifact maven coordinates are going to be different from what is currently in 1.0.0 RC6 (that is polaris-spark-3.5_2.12-1.0.0-incubating-bundle.jar).

Since the polaris-spark artifact name is occupied by the non-bundled Spark Client jar already (without a qualifier), I believe it is not possible to offer maven redirects, so this is going to be a breaking change in terms or Maven coordinates of Polaris artifacts. Regardless of whether we expect users to download this jar via Maven or now, I think it deserves to be mentioned in the CHANGELOG (unless we include this PR into 1.0.0).

Cf. https://lists.apache.org/thread/hrdchvljflcknyq9c7o7by9jhpv204op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i can add the jar name format update in the change log

Copy link
Contributor

@dimas-b dimas-b Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do the same kind of filter in the existing spark client module? If not, what prevents up from doing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed for the other spark client module. This is needed by the bundle jar because the bundle jar packs all the dependencies in one jar, and therefore need a customized license. however, for the regular spark client, we do not pack any dependency in the jar, it is pure source code from Polaris, and the regular poalris license it good enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, why not use the same build code for the bunded jar in the old Spark module? Why do we need a new module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are couple of reasons:

  1. with this approach, we will be able to reuse the current shadowJar plugin for maven publish, instead of this hack added https://github.com/apache/polaris/pull/1991/files#diff-d380e458e986c6183b1e23fcea1169811acfc68b2662ff9c88bb26b821238607L136, which i think make things more consistent.
  2. This seems a more commonly used approach to publish bundle jar or uber jar, such as Iceberg.
  3. This could make the module responsibility more clear, for example, the bundle-license and bundle-notice is only needed for the bundle project, not the regular project.

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jul 7, 2025

@snazy sorry for the confusion, by source jar, i don't refer to the *-source.jar, I am referring to the jar without any classifier, here is what we expected to see once we do publish to maven

maven-metadata-local.xml
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT-bundle.jar
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT-javadoc.jar
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT-sources.jar
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT-test-fixtures.jar
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.module
polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.pom

where you can see we have a jar with name polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar. However, after reuse the shadowJar plugin, what we see are the following

maven-metadata-local.xml
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.module
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar

and when customer uses --packages org.apach.polaris:polaris-spark:1.1.0-incubating-SNAPSHOT, it is actually installing the bundle jar. However, when user is using the --packages options, we expect the polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar is installed, and the other dependency will be downloaded accordingly. The polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT-bundle.jar is only intended for the --jars use case.
In this Pr, we added another bundle project to reuse the shadowJar plugin to help publishing the bundle jar, where we will see the following jar gets published

maven-metadata-local.xml
polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar
polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar
polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT.jar
polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT.module
polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom

and polaris-spark-bundle-3.5_2.12-1.1.0-incubating-SNAPSHOT.jar is used as the --jars option, the original polari-spark project continue remain for the --package use case, with the following jars

polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-javadoc.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-sources.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-test-fixtures.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.jar
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.module
polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT.pom

@gh-yzou gh-yzou force-pushed the yzou-spark-client-bundle branch from a506adf to 672cd1f Compare July 8, 2025 00:57
@snazy
Copy link
Member

snazy commented Jul 8, 2025

@gh-yzou I still do not understand why you need to add a new Gradle project just to have one other jar. It is possible to build and publish both the "raw" and the "shadow" jar from a single project, we do this in Nessie's Spark extensions.

BTW: It is concerning that the term "customer" is being used. Polaris is an Apache project, not a commercial project!

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jul 8, 2025

@snazy I don't need a separate project to publish both Jars (polaris-spark-3.5_2.13-1.1.0-incubating-SNAPSHOT.jar and polaris-spark-3.5_2.12-1.1.0-incubating-SNAPSHOT-bundle.jar). This is what happening today, however, we are not able to reuse the shadowJar plugin.

Regarding to nessie, i assume you are referring this https://github.com/projectnessie/nessie/blob/main/integrations/spark-extensions/build.gradle.kts#L84. However, it seems that this shadowJar packs the dependencies and doesn't have any classifier, which means the shadowJar packed will be the project jar when publish. I did a compile of nessie project, and here is what I see

nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT-javadoc.jar
nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT-license-report.zip
nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT-scaladoc.jar
nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT-sources.jar
nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT.jar
nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT.module
nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT.pom

For which, i believe nessie-spark-extensions-3.5_2.12-0.104.3-SNAPSHOT.jar is the shadowJar. As we have mentioned, the shadowJar is packed for the --jars use case, where we pack everything. For --package use case, the dependencies are suppose to be downloaded while installing the project, and we just need the original project jar (not the shadowJar).

Regarding to "he term "customer" is being used", are you referring to the jar name changes, like from project-spark-xxx-bundle.jar to project-spark-bundle-xxx.jar. Since this is for the --jar use case, and the jar name updates along with the version name for every release, i think the impact should be relative low, especially this is still an experimenting project. We are also making sure that we doc this change clearly in the CHANGELOG

@snazy
Copy link
Member

snazy commented Jul 9, 2025

@gh-yzou It is possible to publish both the "non-shadow" and the "shadow" jar from one project, even with the classifiers reversed. It is rather a matter of configuring the Gradle artifacts and (Maven) publications. So I suggest to consider this approach.

Regarding to "he term "customer" is being used", are you referring to the jar name changes

No, I'm referring to the usage of the commercial term "customers", which inappropriate in Apache projects. Apache projects have users and are by definition not commercial.

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jul 9, 2025

@snazy sure, we can look into how to improve the shadowJar plugin. What I can do is for his Pr just do the improvement of license, and we can do the other improvement in a separate PR.

jbonofre
jbonofre previously approved these changes Jul 9, 2025
Copy link
Member

@jbonofre jbonofre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I would suggest to add a note about the artifact coordinates change as users may be surprised by this change.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jul 9, 2025
@gh-yzou gh-yzou force-pushed the yzou-spark-client-bundle branch from 672cd1f to 1e53608 Compare July 9, 2025 22:46
dimas-b
dimas-b previously approved these changes Jul 9, 2025
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gh-yzou ! The current state of this PR LGTM!

@dimas-b
Copy link
Contributor

dimas-b commented Jul 9, 2025

nit: The PR title seems a bit outdated?

@gh-yzou
Copy link
Contributor Author

gh-yzou commented Jul 9, 2025

@dimas-b i just rebased about an hour ago, but I can rebase it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, looks like this import should be below the copyright comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, let me move it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimas-b i fixed the order, and also updated the PR title and description

@dimas-b
Copy link
Contributor

dimas-b commented Jul 9, 2025

no need to rebase, but the PR title still mentions a new "project", which is no longer the case, right?

@gh-yzou gh-yzou changed the title Add polaris-spark-bundle project to help packing and publish the Spark Client shadow Jar Improve the bundle jar license and notice remove using excldue Jul 9, 2025
@gh-yzou gh-yzou force-pushed the yzou-spark-client-bundle branch from c42dad3 to 1239a9e Compare July 9, 2025 23:18
@gh-yzou gh-yzou requested a review from dimas-b July 9, 2025 23:19
@dimas-b dimas-b changed the title Improve the bundle jar license and notice remove using excldue Improve the bundle jar license and notice remove using exclude Jul 9, 2025
@flyrain flyrain merged commit 33d9940 into apache:main Jul 10, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jul 10, 2025
snazy added a commit to snazy/polaris that referenced this pull request Nov 20, 2025
* Ignore regenerate.sh on README.md (apache#1999)

* OpenAPI-generate: Omit generation timestamp (apache#2004)

The jaxrs-resteasy OpenAPI generator adds the generation timestamp to the generated sources by default. This behavior leads to different code for every generation, leading to unnecessary rebuilds (and re-tests), because the generated `.class` files are different.

* Update CatalogEntity::Builder to set default CatalogType as INTERNAL (apache#1998)

Encountered the issue while adding additional validations to `ExternalCatalog`. The `CatalogEntity::Builder` checks if the `Catalog::type` is set to `INTERNAL`, if not it defaults to `EXTERNAL`. However this is the opposite of the behavior defined in polaris-management-service.yml where the default is set to `INTERNAL`. This change only affects tests because in other cases the catalog entity is generated from the REST request.  

Testing:
Updated CatalogEntityTest to ensure that the default is set to `INTERNAL`.

* Support IMPLICIT authentication type for federated catalogs (apache#1925)

Previously, the ConnectionConfigInfo required explicit AuthenticationParameters for every federated catalog. However, certain catalogs types that Polaris federates to (either now or in the future) allow `IMPLICIT` authentication, wherein the authentication parameters are picked from the environment or configuration files. This change enables federating to such catalogs without passing dummy secrets. 

The `IMPLICIT` option is guarded by the `SUPPORTED_EXTERNAL_CATALOG_AUTHENTICATION_TYPES`. Hence users may create federated catalogs with `IMPLICIT` authentication only when the administrator explicitly enables this feature.

* Fix helm doc (apache#2001)

* Fix helm doc

* Remove persistent ref

* Remove persistent ref

* Fixes based on feedback

* Fixes based on feedback

* Fixes based on feedback

* Fixes based on feedback

* feat(auth): Ability to override active roles provider per realm (apache#2000)

* feat(auth): Ability to override active roles provider per realm

* deprecate old property

* add tests

* Introduce an option to add object storage prefix to table locations (apache#1966)

### Problem

Currently, Polaris enforces that the physical layout of entities maps to the logical layout:
```
catalog
└── ns1
    ├── ns2
    │   └── table_b
    └── table_a
```

In the above example, the base locations of `table_a` and `ns2` are expected to be children of `ns1`, and the location of `table_b` is expected to be a child of `ns2`.

This behavior is controlled by `ALLOW_UNSTRUCTURED_TABLE_LOCATION` and is the basis for the sibling overlap check when `OPTIMIZED_SIBLING_CHECK` is disabled or persistence cannot support the optimized check.

However, some users have reported that this physical organization of data can lead to undesirable performance characteristics when hotspotting occurs across namespaces. If the underlying storage is range partitioned by key, this organization will tend to physically collocate logically-similar entities.

### Solution

To solve this problem, this PR introduces a new option `DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED` which alters the behavior of the catalog when creating a table without a user-specified location. With the feature disabled, a table such as `ns1.table_a` will have a path like this:

```
s3://catalog/base/ns1/table_a/
```

With the feature enabled, a prefix is added before the namespace:
```
s3://catalog/base/0010/0101/0110/10010100/ns1/table_a/
```

This serves to eliminate the physical collocation of tables in the same namespace (or with similarly-named namespaces or table names).

This functionality is similar to Iceberg's `write.object-storage.enabled`, but it applies across tables and namespaces. The two features can and should be combined to achieve the best distribution of data files throughout the key space. 

### Configuration & Sibling Overlap Check

If an admin doesn't care about the risk of vending credentials with the sibling overlap check disabled, they can enable the feature with these configs:
```
polaris.features.DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED=true
polaris.features.ALLOW_UNSTRUCTURED_TABLE_LOCATION=true
polaris.features.ALLOW_TABLE_LOCATION_OVERLAP=true
polaris.behavior-changes.VALIDATE_VIEW_LOCATION_OVERLAP=false
```

In order to use this feature and to preserve the sibling overlap check, you can configure the service with:
```
polaris.features.DEFAULT_LOCATION_OBJECT_STORAGE_PREFIX_ENABLED=true
polaris.features.ALLOW_UNSTRUCTURED_TABLE_LOCATION=true
polaris.features.OPTIMIZED_SIBLING_CHECK=true
```

However, note that the `OPTIMIZED_SIBLING_CHECK` comes with some caveats as outlined in its description. Namely, it currently only works with some persistence implementations and it requires all location-based entities to have a recently-introduced field set. These locations are expected to be suffixed with `/`, and locations with many `/` may not be eligible for the optimized check.

Older Polaris deployments may not meet these requirements without a migration or backfill. Accordingly combining these two features should be considered experimental for the time being.

* Cleanup collaborators in `.asf.yaml` (apache#2008)

Some devs were added in the past to `.asf.yaml` to let CI run w/o committer approval. After [INFRA-26985](https://issues.apache.org/jira/browse/INFRA-26985) this is no longer necessary, so the file can be cleaned up.

* Fix bunch of OpenAPI generation issues (apache#2005)

The current way how OpenAPI Java code is generated suffers from a bunch of issues:
* Changes to any of the source spec files requires a Gradle `clean`, otherwise old generated Java source will remain - i.e. "no longer" existing sources are not removed. This is addressed by adding an additional action to `GenerateTask`.
* The output of `GenerateTask` was explicitly not cached, this is removed, so the output is cached.
* Add explicit inputs to `GenerateTask` to the whole templates and spec folders.

* Restructure the download page (apache#2011)

* Add 1.0.0-incubating release to the downloads page (apache#2018)

* Add 1.0.0 docs to the huge menu (apache#2020)

* Improve the bundle jar license and notice remove using exclude (apache#1991)

* Remove duplicate MetaStoreManagerFactory mocks (apache#2023)

also rename the field for clarity and consistency

* Update Makefile for python client with auto setup (apache#1995)

Automate python client setup and use a virtual env instead to avoid change an end-users' OS python

* Add Helm Chart repo to the downloads page (apache#2025)

* Publish helm doc (apache#2014)

* Make PolarisConfiguration member variables private (apache#2007)

* Make PolarisConfiguration members private

* Make methods final

* Use the 0.9.0 doc from the versioned-docs branch (apache#2026)

* Helm key grouping and test cases (apache#2002)

* Helm key grouping and test cases

* Update README.md

* Added backwards compatible

* Fix conflict

* Use coalesce instead of if else

* Remove kind (apache#2028)

* Remove kind

* Remove k8 dir from check-md-link.yml

* Sync helm doc (apache#2034)

* Update release-guide.md for publishing docs (apache#2035)

* [Site] Simplify the doc directory structure (apache#2033)

* [Site] Update release-guide.md for release dir name (apache#2037)

* Fix gralde command for helm image and remove simple-values.yaml (apache#2036)

* Using the closer.lua download script (apache#2038)

* Fix the LICENSE and NOTICE with the latest dependency updates (apache#1939)

* Fix invalid redirect from public page (apache#2041)

* Make StorageCredentialCache safe for mutli-realm usage (apache#2021)

Injecting the request-scoped `RealmContext` into the application-scoped `StorageCredentialCache` makes things unnecessarily complicated.
Similarly `StorageCredentialCacheKey` having a `@Nullable callContext` makes it more difficult to reason about.

Instead we can determine all realm-specific values at the time of insertion (from the `PolarisCallContext` param of `getOrGenerateSubScopeCreds`).

* feat(ci): Improve Gradle cache in CI (apache#1928)

* Introduce RealmConfig (apache#2015)

Getting a config value currently requires quite a ceremony:
```
ctx.getPolarisCallContext()
   .getConfigurationStore()
   .getConfiguration(ctx.getRealmContext(), "ALLOW_WILDCARD_LOCATION", false))
```
since a `PolarisConfigurationStore` cant be used without a `RealmContext` it makes sense to add a dedicated interface. this allows removal of verbose code and also moves towards injecting that interface via CDI at a request/realm scope in the future.

* Fix CI (apache#2043)

The `store-gradle-cache` job in the `gradle.yaml` GitHub workflow is missing a "checkout", this change adds it to fix CI.

* Fix CI (no 2) (apache#2044)

The newly added `store-gradle-cache` CI job has run some Gradle task to trigger Gradle's automatic cache cleanup. In the source project Nessie we used a simple task `showVersion` to do this. As having this task in Polaris might be useful, adding this task as there's no other suitable task (cheap and not generating much output) seems legit.

* Bump Quarkus version to unblock IntelliJ build (apache#1958)

Use Quarkus 3.24.3 to fix build issues with `:polaris-server:classes`

* Use application-scoped StorageCredentialCache (apache#2022)

Since `StorageCredentialCache` is application scoped and after 6ddd148 its constructor no longer uses the `RealmContext` passed into `getOrCreateStorageCredentialCache` we can now let all `PolarisEntityManager` instances share the same `StorageCredentialCache`.

* Attempt to make Renovate work again (apache#2052)

Looks that I accidentally broke Renovate with apache#1891. This was made under the impression of the [Renovate change to support `baseBranches` in forking-renovate] (renovatebot/renovate#36054). However, a [later Renovate change](renovatebot/renovate#35579)  seems to break that.

The plan here is to:
1. remove the regex from our `baseBranches` option - if that doesn't work then
2. just use the default branch

* main: Update actions/stale digest to 128b2c8 (apache#2053)

* main: Update dependency com.azure:azure-sdk-bom to v1.2.36 (apache#2054)

* main: Update dependency com.fasterxml.jackson:jackson-bom to v2.19.1 (apache#2055)

* main: Update dependency com.google.cloud:google-cloud-storage-bom to v2.53.3 (apache#2057)

* main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1752066187 (apache#2059)

* main: Update dependency com.github.ben-manes.caffeine:caffeine to v3.2.2 (apache#2056)

* main: Update dependency gradle to v8.14.3 (main) (apache#2058)

* main: Update dependency gradle to v8.14.3

* Adjust Gradle update

---------

Co-authored-by: Robert Stupp <snazy@snazy.de>

* main: Update dependency io.micrometer:micrometer-bom to v1.15.2 (apache#2063)

* main: Update dependency com.diffplug.spotless:spotless-plugin-gradle to v7.1.0 (apache#2067)

* main: Update dependency com.nimbusds:nimbus-jose-jwt to v10.3.1 (apache#2062)

* main: Update docker.io/prom/prometheus Docker tag to v3.5.0 (apache#2071)

* main: Update dependency org.junit:junit-bom to v5.13.3 (apache#2064)

* main: Update docker.io/jaegertracing/all-in-one Docker tag to v1.71.0 (apache#2070)

* main: Update medyagh/setup-minikube action to v0.0.20 (apache#2066)

* main: Update dependency org.apache.commons:commons-lang3 to v3.18.0 (apache#2069)

* main: Update log4j2 monorepo to v2.25.1 (apache#2073)

* main: Update immutables to v2.11.0 (apache#2072)

* main: Update dependency org.testcontainers:testcontainers-bom to v1.21.3 (apache#2065)

* main: Update dependency com.google.errorprone:error_prone_core to v2.40.0 (apache#2068)

* main: Update dependency io.netty:netty-codec-http2 to v4.2.3.Final (apache#2074)

* main: Update dependency io.prometheus:prometheus-metrics-exporter-servlet-jakarta to v1.3.10 (apache#2076)

* main: Update dependency net.ltgt.gradle:gradle-errorprone-plugin to v4.3.0 (apache#2079)

* main: Update dependency io.projectreactor.netty:reactor-netty-http to v1.2.8 (apache#2075)

* main: Update dependency com.gradleup.shadow:shadow-gradle-plugin to v8.3.8 (apache#2061)

* main: Update dependency org.eclipse.persistence:eclipselink to v4.0.7 (apache#2078)

* Add External Identity Providers page to unreleased documentation  (apache#2013)

---------

Co-authored-by: Alexandre Dutra <adutra@apache.org>
Co-authored-by: Eric Maynard <emaynard@apache.org>

* main: Update dependency io.opentelemetry:opentelemetry-bom to v1.52.0 (apache#2082)

* main: Update dependency software.amazon.awssdk:bom to v2.31.78 (apache#2080)

* main: Update dependency com.adobe.testing:s3mock-testcontainers to v4.6.0 (apache#2081)

* main: Update dependency io.smallrye.common:smallrye-common-annotation to v2.13.7 (apache#2083)

* Revert PR 2033 (apache#2087)

The PR apache#2033 was merged within less than 3 hours, late on a Friday. Since that change does not address an issue that seriously deserves a quick reaction nor is it a "nit", I'm proposing to revert the change. We do have [community best practices](https://polaris.apache.org/community/contributing-guidelines/) stating to give the whole community enough time to review, which did not happen.

There are concerns that the PR apache#2033 will interfere with the whole effort to automate releases. Since there was no change to review and raise the concerns, I'd like to revert it to not cause any friction with that bigger effort.

Revert "Fix invalid redirect from public page (apache#2041)", commit 493bc8e.
Revert "[Site] Simplify the doc directory structure (apache#2033)", commit 2db2f10.

* Renovate PRs, branch name + PR subject (apache#2060)

Until June, Renovate PRs behaved a little bit different than today. The difference is the branch name. Before it was something like `renovate-bot/renovate/main/org.openapi.generator-7.x`, now it's like `renovate-bot/renovate/main-main/actions-stale-digest` (branch name is duplicated).

I also noticed that the branch name is repeated in the PR subject, which started to be that way some longer ago.

This change removes both duplications.

* Simplify RealmEntityManagerFactory usage in tests (apache#2050)

since all ctor params are created in `IcebergCatalogTest.before` we
can do the same for `RealmEntityManagerFactory`

`PolarisAuthzTestBase.entityManager` is already getting derived from
`realmEntityManagerFactory`:
https://github.com/apache/polaris/blob/2c2052c28f899aaa85e5f11a9131d9812ec62679/runtime/service/src/test/java/org/apache/polaris/service/quarkus/admin/PolarisAuthzTestBase.java#L247

* Use PolarisImmutable for StorageCredentialCacheKey (apache#2029)

* remove unused entityId from StorageCredentialCacheKey

* convert StorageCredentialCacheKey to immutables

* Disable renovatebot on release branches (apache#2085)

Per the mailing list thread "[DISCUSS] Disable renovatebot on release branches", we should not do automatic dependency upgrades for release branches. Since it seems `release/1.0.x` is a release branch, we can remove this regex from renovate's list.

* Site: Remove non-OSS query engines from front page (apache#2031)

* update query engines list

* Add Dremio OSS

* fix(deps): update immutables to v2.11.1 (apache#2113)

* fix(deps): update dependency boto3 to v1.39.4 (apache#2116)

* chore: Avoid deprecated `DefaultCredentialsProvider.create()` (apache#2119)

Use `DefaultCredentialsProvider.builder().build()` as suggested by AWS SDK javadoc.

* fix(deps): update dependency boto3 to v1.39.6 (apache#2120)

* Extensible pagination token implementation (apache#1938)

Based on apache#1838, following up on apache#1555

* Allows multiple implementations of `Token` referencing the "next page", encapsulated in `PageToken`. No changes to `polaris-core` needed to add custom `Token` implementations.
* Extensible to (later) support (cryptographic) signatures to prevent tampered page-token
* Refactor pagination code to delineate API-level page tokens and internal "pointers to data"
* Requests deal with the "previous" token, user-provided page size (optional) and the previous request's page size.
* Concentrate the logic of combining page size requests and previous tokens in `PageTokenUtil`
* `PageToken` subclasses are no longer necessary.
* Serialzation of `PageToken` uses Jackson serialization (smile format)

Since no (metastore level) implementation handling pagination existed before, no backwards compatibility is needed.

Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com>
Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com>

* Site/dev: allow overriding the podman/docker binaries detection (apache#2051)

The scripts in the `bin/` directory are built to work with both Docker and podman. There are nuances in how both behave, especially wrt docker/podman-compose. Some local environment specifics require the use of `podman-compose`, others the use of `docker-compose`. The default behavior is to prefer the `podman` and `podman-compose` binaries, if those exist and fall back to `docker` and `docker-compose`. Some setups using podman however require the use of `docker-compose` even if `podman-compose` is installed. This may manifest in an error message stating that `--userns` and `--pod` cannot be used together. In that case create a file `.user-settings` in the `site/` folder and add these two lines:
```bash
DOCKER=docker
COMPOSE=docker-compose
```

* NoSQL: build descriptions

* NoSQL: README nits

* NoSQL: Misc ports

* Pagination
* Policy fixes
* Adoptions to "conflicting" changes
* runtime-service test abstractions

* Last merged commit d2667e5

---------

Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>
Co-authored-by: Pooja Nilangekar <poojan@umd.edu>
Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com>
Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com>
Co-authored-by: Yufei Gu <yufei@apache.org>
Co-authored-by: Yun Zou <yunzou.colostate@gmail.com>
Co-authored-by: Christopher Lambert <xn137@gmx.de>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: JB Onofré <jbonofre@apache.org>
Co-authored-by: Alexandre Dutra <adutra@apache.org>
Co-authored-by: Adnan Hemani <adnan.h@berkeley.edu>
Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: Mark Hoerth <47870294+markhoerth@users.noreply.github.com>
Co-authored-by: Eric Maynard <emaynard@apache.org>
Co-authored-by: Danica Fine <danica.fine@gmail.com>
Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com>
Co-authored-by: Honah (Jonas) J. <honahx@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants