Skip to content

Conversation

@HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Oct 5, 2022

What changes were proposed in this pull request?

This PR proposes

  1. Move connect to connector/connect to be consistent with Kafka and Avro.
  2. Do not include this in the default Apache Spark release binary.
  3. Fix the module dependency in modules.py.
  4. Fix the usages in README.md with cleaning up.
  5. Cleanup PySpark test structure to be consistent with other PySpark tests.

Why are the changes needed?

To make it consistent with Avro or Kafka, see also https://github.com/apache/spark/pull/37710/files#r978291019

Does this PR introduce any user-facing change?

No, this isn't released yet.

The usage of this project would be changed from:

./bin/spark-shell \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin

to

./bin/spark-shell \
  --packages org.apache.spark:spark-connect_2.12:3.4.0 \
  --conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin

How was this patch tested?

CI in the PR should verify this.

@LuciferYang
Copy link
Contributor

The following 6 jars remvoed from deps file but not in shade jar, are these not required at runtime?

animal-sniffer-annotations/1.19//animal-sniffer-annotations-1.19.jar
annotations/4.1.1.4//annotations-4.1.1.4.jar
error_prone_annotations/2.10.0//error_prone_annotations-2.10.0.jar
j2objc-annotations/1.3//j2objc-annotations-1.3.jar
perfmark-api/0.25.0//perfmark-api-0.25.0.jar
proto-google-common-protos/2.0.1//proto-google-common-protos-2.0.1.jar
[INFO] --- maven-shade-plugin:3.2.4:shade (default) @ spark-connect_2.12 ---
[INFO] Including com.google.guava:guava:jar:31.0.1-jre in the shaded jar.
[INFO] Including com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava in the shaded jar.
[INFO] Including com.google.guava:failureaccess:jar:1.0.1 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:3.21.1 in the shaded jar.
[INFO] Including io.grpc:grpc-netty-shaded:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-core:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-api:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-context:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-lite:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-services:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java-util:jar:3.19.2 in the shaded jar.
[INFO] Including io.grpc:grpc-stub:jar:1.47.0 in the shaded jar.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark/connect/pom.xml

Lines 154 to 158 in 1e30564

<dependency> <!-- necessary for Java 9+ -->
<groupId>org.apache.tomcat</groupId>
<artifactId>annotations-api</artifactId>
<version>${tomcat.annotations.api.version}</version>
<scope>provided</scope>

From the above comments, when using Java 9+, annotations-api-6.0.53.jar is required. Are the corresponding startup commands different? Now annotations-api-6.0.53.jar is neither in Spark's jars directory nor shaded to connect-assembly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably put into connect-assembly but let's set up the tests first, and fix it together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@HyukjinKwon
Copy link
Member Author

Thanks for reviewing @LuciferYang. For #38109 (comment), let's figure it out together with adding the tests incrementally (since this is a new module)

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the Spark-client/jars directory between without connect module and after this pr, their content is same. LGTM if test pass.

@HyukjinKwon HyukjinKwon force-pushed the SPARK-40665 branch 2 times, most recently from c14dc59 to 4e48252 Compare October 6, 2022 00:18
@HyukjinKwon
Copy link
Member Author

Merged to master.

HyukjinKwon pushed a commit that referenced this pull request Oct 6, 2022
…run tests for CONNECT

### What changes were proposed in this pull request?

Correct the example in README for how to run CONNECT tests.

### Why are the changes needed?

This is a follow up minor update after #38109

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #38126 from amaliujia/update_read_me.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon
Copy link
Member Author

Seems like we should shade a couple of more dependenices pointed out in #38109 (comment). It works with SBT because SBT assemblies and includes all dependencies whereas Maven one doesn't.

I am working on this.

HyukjinKwon added a commit that referenced this pull request Oct 7, 2022
… separately

### What changes were proposed in this pull request?

This PR makes the following dependencies shaded:

```
com.google.android:annotations
com.google.api.grpc:proto-google-common-proto
io.perfmark:perfmark-api
org.codehaus.mojo:animal-sniffer-annotations
com.google.errorprone:error_prone_annotations
com.google.j2objc:j2objc-annotations
```

Before #38109, it worked because related dependences pulled together but now we don't as Spark Connect would be a single jar. This issue has existed from the very first place.

### Why are the changes needed?

Otherwise, the tests fails if you build Spark Connect with Maven. SBT does not have the issue because it does the assemply with all dependencies.

### Does this PR introduce _any_ user-facing change?

No, the codes are not released out yet.

### How was this patch tested?

Manually tested via Maven:

```bash
./build/mvn clean package
./python/run-tests --testnames 'pyspark.sql.tests.test_connect_basic'
```

Closes #38132 from HyukjinKwon/SPARK-40677.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the SPARK-40665 branch January 15, 2024 00:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants