Reuse shadowJar for spark client bundle jar maven publish #1857

gh-yzou · 2025-06-11T03:28:55Z

We previously added a special check in PublishingHelperPlugin.kt to check specifically for the jar job for polaris-spark project, and publish the artifact output for the ShadowJar Task added. However, we already have a shadowJar infra that takes care of the maven publish.
In this PR, we switch to reuse the shardowJar Infra, and reverted the change we added before.

snazy · 2025-06-11T17:39:10Z

plugins/spark/v3.5/spark/build.gradle.kts

testJar??

sorry for the confusion, that is a bad name. I am just referring the original default jar job, updated the classifier to "defaultJar" to be more clear.

dimas-b · 2025-06-11T17:53:31Z

plugins/spark/v3.5/spark/build.gradle.kts

What is it when we do not override it to null?

the original name is something like polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar, we did this because the jar without classifier is taken by the default jar job with name polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT.jar. However, Spark does not support using classifier in the package config, so we make this jar the jar for this project, since this jar is the actual jar needed by spark, i think it actually should be the jar project without any classifier

Yes, I understand the intent :) My question is about the need to set archiveClassifier to null... Do we have to use null here?

Oh, sorry, we don't have to, the default is null. i was putting it there to be clear, and I can remove it if preferred, but I think it might be better to be more explicit in the code.

From my POV removing the assignment is preferable since the value is the same as default.

I'd prefer to have a comment about adding a classifier to the jar task instead.

sg! i removed the specification of the classifier, and added a comment at the place where i added the classifier for the jar task

@dimas-b after switch to use the shadowJarPlugin, i need to specify the classifier here, otherwise, it seems configuring to generate a jar with classifier "all", but I was also able to get rid of the other jar change

actually, sorry, it seems still needed, my previous gradlew build seems coming from cache. Added it back!

dimas-b · 2025-06-11T21:33:55Z

plugins/spark/v3.5/spark/build.gradle.kts

Why not remove the plain jar artifact from this module completely?

I tried that before, however, it seems the task test depends on the jar job in the default configuration. i tried to switch the test task to depends on the createPolarisSparkJar, but because that jar job did a relocation of module com.fasterxml, one of our test fails the deserialization test, because it is now looking for the shaded one, not the original one.
So far I haven't found a good solution yet, so I kept the original jar. wondering if you got some better solutions for this problem?

Thanks for the detailed analysis, @gh-yzou ! Unfortunately, I do not have a better solution off the top of my head.

How about using the internal classifier for this jar? I suppose it is not meant for reuse.

yes, it is not intended for reuse. The name "internal" make sense to me, upated

dimas-b

LGTM 👍 Thanks, @gh-yzou !

gh-yzou · 2025-06-11T23:34:34Z

@dimas-b i think you asked a question somewhere, but it doesn't show up in the PR for some reason. For the artifact, i don't think we have "client" in the artifact name, the iceberg one is called iceberg-spark-runtime-xxxx.jar, and our polaris one is called polaris-spark-xxx.jar. For iceberg, i guess the reason is that iceberg-spark is already taken by another projects, but i don't think we need to be exactly the same as iceberg.
Some of the doc description might introduce some confusion, i went one more pass to make sure the description are more consistent.

dimas-b · 2025-06-11T23:40:01Z

Re: polaris-spark-xxx.jar. it is not really related to this PR :)

I value short jar names, but at the same time it might be worth clarifying whether this jar applies to the whole of Polaris integration with Spark or just to Generic Tables.

In other words, do we foresee making any other Polaris jars to be put on the Spark class path?

If no, the current name is fine from my POV, if yes, let's discuss that naming convention on the dev ML (since it's not about this build change really).

dimas-b

Changes LGTM, but I believe the PR description is a bit off WRT actual changes now 🤔 WDYT?

gh-yzou · 2025-06-18T18:48:04Z

@dimas-b sorry, i updated the title, but forgot to update the description, also updated the description

* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme

…ache#1857)" This reverts commit 1f7f127.

)" (#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project

…ache#1857)" This reverts commit 1f7f127.

* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme

…ache#1857)" This reverts commit 40f4d36.

…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests

)" (#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project

…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests

* Cleanup unnecessary files in client/python (apache#1878) Cleanup unnecessary files in `client/python` * Bump version in version.txt With the release/1.0.0 branch being cut, we should bump this to reflect the current state of main * JDBC: Refactor DatabaseOps (apache#1843) * removes the databaseType computation from JDBCMetastoreManagerFactory to DbOperations * wraps the bootstrap in a transaction ! * refactor Production Readiness checks for Postgres * Fix two wrong links in README.md (apache#1879) * Avoid using org.testcontainers.shaded.** (apache#1876) * main: Update dependency io.smallrye.config:smallrye-config-core to v3.13.2 (apache#1888) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1749462970 (apache#1887) * main: Update dependency boto3 to v1.38.36 (apache#1886) * fix(build): Fix deprecation warnings in PolarisIntegrationTestExtension (apache#1895) * Enable patch version updates for maintained Polaris version (apache#1891) Polaris 1.x will be a supported/maintained release. It is crucial to apply bug and security fixes to such release branches. Therefore, this change enables patch-version updates for Polaris 1.* * Add Polaris community meeting record for 2025-06-12 (apache#1892) * Do not use relative path inside CLI script Issue apache#1868 reported that the Polaris script can fail when it's run from an unexpected path. The recent addition of a reference to `./gradlew` looks incorrect here, and should be changed to use an absolute path. Fixes apache#1868 * feat(build): Add Checkstyle plugin and an IllegalImport rule (apache#1880) * Python CI: pin mypy version to avoid CI failure due to new release (apache#1903) Mypy did a new release 1.16.1 and it cause our CI to fail for about 20 minutes due to missing wheel (upload not completed) ``` | Unable to find installation candidates for mypy (1.16.1) | | This is likely not a Poetry issue. | | - 14 candidate(s) were identified for the package | - 14 wheel(s) were skipped as your project's environment does not support the identified abi tags | | Solutions: | Make sure the lockfile is up-to-date. You can try one of the following; | | 1. Regenerate lockfile: poetry lock --no-cache --regenerate | 2. Update package : poetry update --no-cache mypy | | If neither works, please first check to verify that the mypy has published wheels available from your configured source that are compatible with your environment- ie. operating system, architecture (x86_64, arm64 etc.), python interpreter. | ``` This PR temporarily restrict the mypy version to avoid the similar issue. We may consider bring poetry.lock back to git tracking so we won't automatically update test dependencies all the time * Remove `.github/CODEOWNERS` (apache#1902) As per this [dev-ML discussion](https://lists.apache.org/thread/jjr5w3hslk755yvxy8b3z45c7094cxdn) * Rename quarkus as runtime (apache#1695) * Rename runtime/test-commons to runtime/test-common (for consistency with module name) (apache#1906) * docs: Add `Polaris Evolution` page (apache#1890) --------- Co-authored-by: Eric Maynard <emaynard@apache.org> * feat(ci): Split Java Gradle CI in many jobs to reduce execution time (apache#1897) * Add webpage for Generic Table support (apache#1889) * add change * add comment * address feedback * update limitations * update docs * update doc * address feedback * Improve the parsing and validation of UserSecretReferenceUrns (apache#1840) This change addresses all the TODOs found the org.polaris.core.secrets package. Main changes: - Create a helper to parse, validate and build the URN strings. - Use Regex instead of `String.split()`. - Add Precondition checks to ensure that the URN is valid and the UserSecretManager matches the expected type. - Remove the now unused `GLOBAL_INSTANCE` of the UnsafeInMemorySecretsManager. Testing - Existing `UnsafeInMemorySecretsManagerTest` captures most of the functional changes. - Added `UserSecretReferenceUrnHelperTest` to capture the utilities exposed. * Reuse shadowJar for spark client bundle jar maven publish (apache#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * fix(ci): Remove dummy "build" job from Gradle CI (apache#1911) Since apache#1897, the jobs in gradle.yaml changed and the "build" job was split into many smaller jobs. But since it was a required job, it couldn't be removed immediately. * main: Update Quarkus Platform and Group to v3.23.3 (apache#1797) * main: Update Quarkus Platform and Group to v3.23.3 * Adopt polaris-admin test invocation --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * Feature: Rollback compaction on conflict (apache#1285) Intention is make the catalog smarter, to revert the compaction commits in case of crunch to let the writers who are actually adding or removing the data to the table succeed. In a sense treating compaction as always a lower priority process. Presently the rest catalog client creates the snapshot and asks the Rest Server to apply the snapshot and gives this in a combination of requirement and update. Polaris could apply some basic inference and generate some updates to metadata given a property is enabled at a table level, by saying that It will revert back the commit which was created by compaction and let the write succeed. I had this PR in OSS, which was essentially doing this at the client end, but we think its best if we do this as server end. to support more such clients. How to use this Enable a catalog level configuration : polaris.config.rollback.compaction.on-conflicts.enabled when this is enabled polaris will apply the intelligence of rollbacking those REPLACE ops snapshot which have the property of polaris.internal.rollback.compaction.on-conflict in their snapshot summary to resolve conflicts at the server end ! a sample use case is there is a deployment of a Polaris where this config is enabled and there is auto compaction (maintenance job) which is updating the table state, it adds the snapshot summary that polaris.internal.rollback.compaction.on-conflict is true now when a backfill process running for 8 hours want to commit but can't because the compaction job committed before so in this case it will reach out to Polaris and Polaris will see if the snapshot of compation aka replace snapshot has this property if yes roll it back and let the writer succeed ! Devlist: https://lists.apache.org/thread/8k8t77dgk1vc124fnb61932bdp9kf1lc * NoSQL: nits * `AutoCloseable` for `PersistenceTestExtension` * checkstyle adoptions * fix: unify bootstrap credentials and standardize POLARIS setup (apache#1905) - unified formatting across docker, gradle - reverted secret to s3cr3t - updated docker-compose, README, conftest.py use POLARIS for consistency across docker, gradle and others. * Add doc for rollback config (apache#1919) * Revert "Reuse shadowJar for spark client bundle jar maven publish (apache#1857)" (apache#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project * fix(build): Gradle caching effectively not working (apache#1922) Using a `custom()` spotless formatter check effectively disables caching, see `com.diffplug.gradle.spotless.FormatExtension#custom(java.lang.String, com.diffplug.spotless.FormatterFunc)` using `globalState`, which is a `NeverUpToDateBetweenRuns`. This change refactors this to be cachable. We also already have a errorprone rule, so we can get rid entirely of the spotless step. * Update spark client to use the shaded iceberg-core in iceberg-spark-runtime to avoid spark compatibilities issue (apache#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (apache#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (apache#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (apache#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests * Last merged commit 93938fd --------- Co-authored-by: Honah (Jonas) J. <honahx@apache.org> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com> Co-authored-by: Yufei Gu <yufei@apache.org> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Eric Maynard <emaynard@apache.org> Co-authored-by: Yun Zou <yunzou.colostate@gmail.com> Co-authored-by: Pooja Nilangekar <poojan@umd.edu> Co-authored-by: Seungchul Lee <scleefe01@gmail.com>

gh-yzou requested review from adutra and ashvina as code owners June 11, 2025 03:28

github-project-automation bot added this to Basic Kanban Board Jun 11, 2025

gh-yzou requested review from MonkeyCanCode, RussellSpitzer, collado-mike, dennishuo, dimas-b, ebyhr, eric-maynard, flyrain, jackye1995, jbonofre, snazy, takidau and vvcephei as code owners June 11, 2025 03:28

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Jun 11, 2025

gh-yzou requested review from HonahX, ajantha-bhat, pingtimeout and singhpk234 as code owners June 11, 2025 03:28

snazy reviewed Jun 11, 2025

View reviewed changes

dimas-b reviewed Jun 11, 2025

View reviewed changes

gh-yzou force-pushed the yzou-test-plugin branch from 00ca1b7 to b6f25a7 Compare June 11, 2025 23:09

dimas-b previously approved these changes Jun 11, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jun 11, 2025

dimas-b added the 1.0-blocker label Jun 11, 2025

gh-yzou added 8 commits June 18, 2025 11:25

update build and readme

df33614

add back jr

c011531

udpate dependency

7aa6a26

add change

5979e2b

update

a1f892c

update tests

cd2a94e

remove merge service file

1fb7ccd

update readme

841bcc4

gh-yzou force-pushed the yzou-test-plugin branch from 4fecc92 to 841bcc4 Compare June 18, 2025 18:30

update readme

5ad378f

dimas-b reviewed Jun 18, 2025

View reviewed changes

dimas-b approved these changes Jun 18, 2025

View reviewed changes

flyrain approved these changes Jun 18, 2025

View reviewed changes

gh-yzou merged commit 1f7f127 into apache:main Jun 18, 2025
12 checks passed

github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jun 18, 2025

gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 21, 2025

Revert "Reuse shadowJar for spark client bundle jar maven publish (ap…

635558d

…ache#1857)" This reverts commit 1f7f127.

gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 23, 2025

Revert "Reuse shadowJar for spark client bundle jar maven publish (ap…

e512399

…ache#1857)" This reverts commit 1f7f127.

gh-yzou added a commit to gh-yzou/polaris that referenced this pull request Jun 23, 2025

Revert "Reuse shadowJar for spark client bundle jar maven publish (ap…

82f31e7

…ache#1857)" This reverts commit 40f4d36.

Reuse shadowJar for spark client bundle jar maven publish #1857

Reuse shadowJar for spark client bundle jar maven publish #1857

Uh oh!

Conversation

gh-yzou commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gh-yzou Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

gh-yzou commented Jun 11, 2025

Uh oh!

dimas-b commented Jun 11, 2025

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

gh-yzou commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gh-yzou commented Jun 11, 2025 •

edited

Loading

gh-yzou Jun 12, 2025 •

edited

Loading

dimas-b Jun 11, 2025 •

edited

Loading