-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40665][CONNECT] Avoid embedding Spark Connect in the Apache Spark binary release #38109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8eba16c to
07a1e52
Compare
|
The following 6 jars remvoed from deps file but not in shade jar, are these not required at runtime? |
python/pyspark/sql/connect/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 154 to 158 in 1e30564
| <dependency> <!-- necessary for Java 9+ --> | |
| <groupId>org.apache.tomcat</groupId> | |
| <artifactId>annotations-api</artifactId> | |
| <version>${tomcat.annotations.api.version}</version> | |
| <scope>provided</scope> |
From the above comments, when using Java 9+, annotations-api-6.0.53.jar is required. Are the corresponding startup commands different? Now annotations-api-6.0.53.jar is neither in Spark's jars directory nor shaded to connect-assembly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably put into connect-assembly but let's set up the tests first, and fix it together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
Thanks for reviewing @LuciferYang. For #38109 (comment), let's figure it out together with adding the tests incrementally (since this is a new module) |
LuciferYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the Spark-client/jars directory between without connect module and after this pr, their content is same. LGTM if test pass.
c14dc59 to
4e48252
Compare
4e48252 to
735114e
Compare
2da8be9 to
1d2bc8f
Compare
1d2bc8f to
03bdc9c
Compare
|
Merged to master. |
…run tests for CONNECT ### What changes were proposed in this pull request? Correct the example in README for how to run CONNECT tests. ### Why are the changes needed? This is a follow up minor update after #38109 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? N/A Closes #38126 from amaliujia/update_read_me. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
Seems like we should shade a couple of more dependenices pointed out in #38109 (comment). It works with SBT because SBT assemblies and includes all dependencies whereas Maven one doesn't. I am working on this. |
… separately ### What changes were proposed in this pull request? This PR makes the following dependencies shaded: ``` com.google.android:annotations com.google.api.grpc:proto-google-common-proto io.perfmark:perfmark-api org.codehaus.mojo:animal-sniffer-annotations com.google.errorprone:error_prone_annotations com.google.j2objc:j2objc-annotations ``` Before #38109, it worked because related dependences pulled together but now we don't as Spark Connect would be a single jar. This issue has existed from the very first place. ### Why are the changes needed? Otherwise, the tests fails if you build Spark Connect with Maven. SBT does not have the issue because it does the assemply with all dependencies. ### Does this PR introduce _any_ user-facing change? No, the codes are not released out yet. ### How was this patch tested? Manually tested via Maven: ```bash ./build/mvn clean package ./python/run-tests --testnames 'pyspark.sql.tests.test_connect_basic' ``` Closes #38132 from HyukjinKwon/SPARK-40677. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR proposes
connecttoconnector/connectto be consistent with Kafka and Avro.modules.py.README.mdwith cleaning up.Why are the changes needed?
To make it consistent with Avro or Kafka, see also https://github.com/apache/spark/pull/37710/files#r978291019
Does this PR introduce any user-facing change?
No, this isn't released yet.
The usage of this project would be changed from:
to
How was this patch tested?
CI in the PR should verify this.