-
Notifications
You must be signed in to change notification settings - Fork 332
Reuse shadowJar for spark client bundle jar maven publish #1857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testJar??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the confusion, that is a bad name. I am just referring the original default jar job, updated the classifier to "defaultJar" to be more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is it when we do not override it to null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original name is something like polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT-bundle.jar, we did this because the jar without classifier is taken by the default jar job with name polaris-spark-3.5_2.12-0.11.0-beta-incubating-SNAPSHOT.jar. However, Spark does not support using classifier in the package config, so we make this jar the jar for this project, since this jar is the actual jar needed by spark, i think it actually should be the jar project without any classifier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand the intent :) My question is about the need to set archiveClassifier to null... Do we have to use null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, we don't have to, the default is null. i was putting it there to be clear, and I can remove it if preferred, but I think it might be better to be more explicit in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my POV removing the assignment is preferable since the value is the same as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have a comment about adding a classifier to the jar task instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg! i removed the specification of the classifier, and added a comment at the place where i added the classifier for the jar task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimas-b after switch to use the shadowJarPlugin, i need to specify the classifier here, otherwise, it seems configuring to generate a jar with classifier "all", but I was also able to get rid of the other jar change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, sorry, it seems still needed, my previous gradlew build seems coming from cache. Added it back!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not remove the plain jar artifact from this module completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that before, however, it seems the task test depends on the jar job in the default configuration. i tried to switch the test task to depends on the createPolarisSparkJar, but because that jar job did a relocation of module com.fasterxml, one of our test fails the deserialization test, because it is now looking for the shaded one, not the original one.
So far I haven't found a good solution yet, so I kept the original jar. wondering if you got some better solutions for this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed analysis, @gh-yzou ! Unfortunately, I do not have a better solution off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the internal classifier for this jar? I suppose it is not meant for reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is not intended for reuse. The name "internal" make sense to me, upated
00ca1b7 to
b6f25a7
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Thanks, @gh-yzou !
|
@dimas-b i think you asked a question somewhere, but it doesn't show up in the PR for some reason. For the artifact, i don't think we have "client" in the artifact name, the iceberg one is called iceberg-spark-runtime-xxxx.jar, and our polaris one is called polaris-spark-xxx.jar. For iceberg, i guess the reason is that iceberg-spark is already taken by another projects, but i don't think we need to be exactly the same as iceberg. |
|
Re: I value short jar names, but at the same time it might be worth clarifying whether this jar applies to the whole of Polaris integration with Spark or just to Generic Tables. In other words, do we foresee making any other Polaris jars to be put on the Spark class path? If no, the current name is fine from my POV, if yes, let's discuss that naming convention on the |
4fecc92 to
841bcc4
Compare
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, but I believe the PR description is a bit off WRT actual changes now 🤔 WDYT?
|
@dimas-b sorry, i updated the title, but forgot to update the description, also updated the description |
* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme
…ache#1857)" This reverts commit 1f7f127.
)" (#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project
…ache#1857)" This reverts commit 1f7f127.
* fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme
…ache#1857)" This reverts commit 40f4d36.
…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests
)" (#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project
…untime to avoid spark compatibilities issue (#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests
* Cleanup unnecessary files in client/python (apache#1878) Cleanup unnecessary files in `client/python` * Bump version in version.txt With the release/1.0.0 branch being cut, we should bump this to reflect the current state of main * JDBC: Refactor DatabaseOps (apache#1843) * removes the databaseType computation from JDBCMetastoreManagerFactory to DbOperations * wraps the bootstrap in a transaction ! * refactor Production Readiness checks for Postgres * Fix two wrong links in README.md (apache#1879) * Avoid using org.testcontainers.shaded.** (apache#1876) * main: Update dependency io.smallrye.config:smallrye-config-core to v3.13.2 (apache#1888) * main: Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.22-1.1749462970 (apache#1887) * main: Update dependency boto3 to v1.38.36 (apache#1886) * fix(build): Fix deprecation warnings in PolarisIntegrationTestExtension (apache#1895) * Enable patch version updates for maintained Polaris version (apache#1891) Polaris 1.x will be a supported/maintained release. It is crucial to apply bug and security fixes to such release branches. Therefore, this change enables patch-version updates for Polaris 1.* * Add Polaris community meeting record for 2025-06-12 (apache#1892) * Do not use relative path inside CLI script Issue apache#1868 reported that the Polaris script can fail when it's run from an unexpected path. The recent addition of a reference to `./gradlew` looks incorrect here, and should be changed to use an absolute path. Fixes apache#1868 * feat(build): Add Checkstyle plugin and an IllegalImport rule (apache#1880) * Python CI: pin mypy version to avoid CI failure due to new release (apache#1903) Mypy did a new release 1.16.1 and it cause our CI to fail for about 20 minutes due to missing wheel (upload not completed) ``` | Unable to find installation candidates for mypy (1.16.1) | | This is likely not a Poetry issue. | | - 14 candidate(s) were identified for the package | - 14 wheel(s) were skipped as your project's environment does not support the identified abi tags | | Solutions: | Make sure the lockfile is up-to-date. You can try one of the following; | | 1. Regenerate lockfile: poetry lock --no-cache --regenerate | 2. Update package : poetry update --no-cache mypy | | If neither works, please first check to verify that the mypy has published wheels available from your configured source that are compatible with your environment- ie. operating system, architecture (x86_64, arm64 etc.), python interpreter. | ``` This PR temporarily restrict the mypy version to avoid the similar issue. We may consider bring poetry.lock back to git tracking so we won't automatically update test dependencies all the time * Remove `.github/CODEOWNERS` (apache#1902) As per this [dev-ML discussion](https://lists.apache.org/thread/jjr5w3hslk755yvxy8b3z45c7094cxdn) * Rename quarkus as runtime (apache#1695) * Rename runtime/test-commons to runtime/test-common (for consistency with module name) (apache#1906) * docs: Add `Polaris Evolution` page (apache#1890) --------- Co-authored-by: Eric Maynard <emaynard@apache.org> * feat(ci): Split Java Gradle CI in many jobs to reduce execution time (apache#1897) * Add webpage for Generic Table support (apache#1889) * add change * add comment * address feedback * update limitations * update docs * update doc * address feedback * Improve the parsing and validation of UserSecretReferenceUrns (apache#1840) This change addresses all the TODOs found the org.polaris.core.secrets package. Main changes: - Create a helper to parse, validate and build the URN strings. - Use Regex instead of `String.split()`. - Add Precondition checks to ensure that the URN is valid and the UserSecretManager matches the expected type. - Remove the now unused `GLOBAL_INSTANCE` of the UnsafeInMemorySecretsManager. Testing - Existing `UnsafeInMemorySecretsManagerTest` captures most of the functional changes. - Added `UserSecretReferenceUrnHelperTest` to capture the utilities exposed. * Reuse shadowJar for spark client bundle jar maven publish (apache#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * fix(ci): Remove dummy "build" job from Gradle CI (apache#1911) Since apache#1897, the jobs in gradle.yaml changed and the "build" job was split into many smaller jobs. But since it was a required job, it couldn't be removed immediately. * main: Update Quarkus Platform and Group to v3.23.3 (apache#1797) * main: Update Quarkus Platform and Group to v3.23.3 * Adopt polaris-admin test invocation --------- Co-authored-by: Robert Stupp <snazy@snazy.de> * Feature: Rollback compaction on conflict (apache#1285) Intention is make the catalog smarter, to revert the compaction commits in case of crunch to let the writers who are actually adding or removing the data to the table succeed. In a sense treating compaction as always a lower priority process. Presently the rest catalog client creates the snapshot and asks the Rest Server to apply the snapshot and gives this in a combination of requirement and update. Polaris could apply some basic inference and generate some updates to metadata given a property is enabled at a table level, by saying that It will revert back the commit which was created by compaction and let the write succeed. I had this PR in OSS, which was essentially doing this at the client end, but we think its best if we do this as server end. to support more such clients. How to use this Enable a catalog level configuration : polaris.config.rollback.compaction.on-conflicts.enabled when this is enabled polaris will apply the intelligence of rollbacking those REPLACE ops snapshot which have the property of polaris.internal.rollback.compaction.on-conflict in their snapshot summary to resolve conflicts at the server end ! a sample use case is there is a deployment of a Polaris where this config is enabled and there is auto compaction (maintenance job) which is updating the table state, it adds the snapshot summary that polaris.internal.rollback.compaction.on-conflict is true now when a backfill process running for 8 hours want to commit but can't because the compaction job committed before so in this case it will reach out to Polaris and Polaris will see if the snapshot of compation aka replace snapshot has this property if yes roll it back and let the writer succeed ! Devlist: https://lists.apache.org/thread/8k8t77dgk1vc124fnb61932bdp9kf1lc * NoSQL: nits * `AutoCloseable` for `PersistenceTestExtension` * checkstyle adoptions * fix: unify bootstrap credentials and standardize POLARIS setup (apache#1905) - unified formatting across docker, gradle - reverted secret to s3cr3t - updated docker-compose, README, conftest.py use POLARIS for consistency across docker, gradle and others. * Add doc for rollback config (apache#1919) * Revert "Reuse shadowJar for spark client bundle jar maven publish (apache#1857)" (apache#1921) …857)" This reverts commit 1f7f127. The shadowJar plugin actually stops publish the original jar, which is not what spark client intend to publish for the --package usage. Revert it for now, will follow up with a better way to reuse the shadow jar plugin, likely with a separate bundle project * fix(build): Gradle caching effectively not working (apache#1922) Using a `custom()` spotless formatter check effectively disables caching, see `com.diffplug.gradle.spotless.FormatExtension#custom(java.lang.String, com.diffplug.spotless.FormatterFunc)` using `globalState`, which is a `NeverUpToDateBetweenRuns`. This change refactors this to be cachable. We also already have a errorprone rule, so we can get rid entirely of the spotless step. * Update spark client to use the shaded iceberg-core in iceberg-spark-runtime to avoid spark compatibilities issue (apache#1908) * add change * add comment * update change * add comment * add change * add tests * add comment * clean up style check * update build * Revert "Reuse shadowJar for spark client bundle jar maven publish (apache#1857)" This reverts commit 1f7f127. * Reuse shadowJar for spark client bundle jar maven publish (apache#1857) * fix spark client * fix test failure and address feedback * fix error * update regression test * update classifier name * address comment * add change * update doc * update build and readme * add back jr * udpate dependency * add change * update * update tests * remove merge service file * update readme * update readme * update checkstyl * rebase with main * Revert "Reuse shadowJar for spark client bundle jar maven publish (apache#1857)" This reverts commit 40f4d36. * update checkstyle * revert change * address comments * trigger tests * Last merged commit 93938fd --------- Co-authored-by: Honah (Jonas) J. <honahx@apache.org> Co-authored-by: Eric Maynard <eric.maynard+oss@snowflake.com> Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com> Co-authored-by: Yufei Gu <yufei@apache.org> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Alexandre Dutra <adutra@users.noreply.github.com> Co-authored-by: JB Onofré <jbonofre@apache.org> Co-authored-by: Eric Maynard <emaynard@apache.org> Co-authored-by: Yun Zou <yunzou.colostate@gmail.com> Co-authored-by: Pooja Nilangekar <poojan@umd.edu> Co-authored-by: Seungchul Lee <scleefe01@gmail.com>
We previously added a special check in PublishingHelperPlugin.kt to check specifically for the jar job for polaris-spark project, and publish the artifact output for the ShadowJar Task added. However, we already have a shadowJar infra that takes care of the maven publish.
In this PR, we switch to reuse the shardowJar Infra, and reverted the change we added before.