Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Oct 8, 2022

What changes were proposed in this pull request?

This main change of this pr is refactor shade relocation/rename rules refer to result of mvn dependency:tree -pl connector/connect to
ensure that maven and sbt produce assembly jar according to the same rules.

The main parts of mvn dependency:tree -pl connector/connect result as follows:

[INFO] +- com.google.guava:guava:jar:31.0.1-jre:compile
[INFO] |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:3.12.0:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.7.1:compile
[INFO] |  \- com.google.j2objc:j2objc-annotations:jar:1.3:compile
[INFO] +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] +- com.google.protobuf:protobuf-java:jar:3.21.1:compile
[INFO] +- io.grpc:grpc-netty:jar:1.47.0:compile
[INFO] |  +- io.grpc:grpc-core:jar:1.47.0:compile
[INFO] |  |  +- com.google.code.gson:gson:jar:2.9.0:runtime
[INFO] |  |  +- com.google.android:annotations:jar:4.1.1.4:runtime
[INFO] |  |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.19:runtime
[INFO] |  +- io.netty:netty-codec-http2:jar:4.1.72.Final:compile
[INFO] |  |  \- io.netty:netty-codec-http:jar:4.1.72.Final:compile
[INFO] |  +- io.netty:netty-handler-proxy:jar:4.1.72.Final:runtime
[INFO] |  |  \- io.netty:netty-codec-socks:jar:4.1.72.Final:runtime
[INFO] |  +- io.perfmark:perfmark-api:jar:0.25.0:runtime
[INFO] |  \- io.netty:netty-transport-native-unix-common:jar:4.1.72.Final:runtime
[INFO] +- io.grpc:grpc-protobuf:jar:1.47.0:compile
[INFO] |  +- io.grpc:grpc-api:jar:1.47.0:compile
[INFO] |  |  \- io.grpc:grpc-context:jar:1.47.0:compile
[INFO] |  +- com.google.api.grpc:proto-google-common-protos:jar:2.0.1:compile
[INFO] |  \- io.grpc:grpc-protobuf-lite:jar:1.47.0:compile
[INFO] +- io.grpc:grpc-services:jar:1.47.0:compile
[INFO] |  \- com.google.protobuf:protobuf-java-util:jar:3.19.2:runtime
[INFO] +- io.grpc:grpc-stub:jar:1.47.0:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile

The new shade rule excludes the following jar packages:

  • scala related jars
  • netty related jars
  • only sbt inlcude jars before: pmml-model-.jar, findbugs jsr305-.jar, spark unused-1.0.0.jar

So after this pr

maven shade will includes the following jars:

[INFO] --- maven-shade-plugin:3.2.4:shade (default) @ spark-connect_2.12 ---
[INFO] Including com.google.guava:guava:jar:31.0.1-jre in the shaded jar.
[INFO] Including com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava in the shaded jar.
[INFO] Including org.checkerframework:checker-qual:jar:3.12.0 in the shaded jar.
[INFO] Including com.google.errorprone:error_prone_annotations:jar:2.7.1 in the shaded jar.
[INFO] Including com.google.j2objc:j2objc-annotations:jar:1.3 in the shaded jar.
[INFO] Including com.google.guava:failureaccess:jar:1.0.1 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:3.21.1 in the shaded jar.
[INFO] Including io.grpc:grpc-netty:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-core:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.9.0 in the shaded jar.
[INFO] Including com.google.android:annotations:jar:4.1.1.4 in the shaded jar.
[INFO] Including org.codehaus.mojo:animal-sniffer-annotations:jar:1.19 in the shaded jar.
[INFO] Including io.perfmark:perfmark-api:jar:0.25.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-api:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-context:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-common-protos:jar:2.0.1 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-lite:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-services:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java-util:jar:3.19.2 in the shaded jar.
[INFO] Including io.grpc:grpc-stub:jar:1.47.0 in the shaded jar.

sbt assembly will include the following jars:

[debug] Including from cache: j2objc-annotations-1.3.jar
[debug] Including from cache: guava-31.0.1-jre.jar
[debug] Including from cache: protobuf-java-3.21.1.jar
[debug] Including from cache: grpc-services-1.47.0.jar
[debug] Including from cache: failureaccess-1.0.1.jar
[debug] Including from cache: grpc-stub-1.47.0.jar
[debug] Including from cache: perfmark-api-0.25.0.jar
[debug] Including from cache: annotations-4.1.1.4.jar
[debug] Including from cache: listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
[debug] Including from cache: animal-sniffer-annotations-1.19.jar
[debug] Including from cache: checker-qual-3.12.0.jar
[debug] Including from cache: grpc-netty-1.47.0.jar
[debug] Including from cache: grpc-api-1.47.0.jar
[debug] Including from cache: grpc-protobuf-lite-1.47.0.jar
[debug] Including from cache: grpc-protobuf-1.47.0.jar
[debug] Including from cache: grpc-context-1.47.0.jar
[debug] Including from cache: grpc-core-1.47.0.jar
[debug] Including from cache: protobuf-java-util-3.19.2.jar
[debug] Including from cache: error_prone_annotations-2.10.0.jar
[debug] Including from cache: gson-2.9.0.jar
[debug] Including from cache: proto-google-common-protos-2.0.1.jar

All the dependencies mentioned above are relocationed to the org.sparkproject.connect package according to the new rules to avoid conflicts with other third-party dependencies.

Why are the changes needed?

Refactor shade relocation/rename rules to ensure that maven and sbt produce assembly jar according to the same rules.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions

@LuciferYang LuciferYang marked this pull request as draft October 8, 2022 11:04
@LuciferYang
Copy link
Contributor Author

Test first

@LuciferYang
Copy link
Contributor Author

sbt assembly Including follows:

[debug] Including: listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
[debug] Including: protobuf-java-util-3.19.2.jar
[debug] Including: scala-collection-compat_2.12-2.2.0.jar
[debug] Including: perfmark-api-0.25.0.jar
[debug] Including: error_prone_annotations-2.10.0.jar
[debug] Including: annotations-4.1.1.4.jar
[debug] Including: jsr305-3.0.2.jar
[debug] Including: grpc-api-1.47.0.jar
[debug] Including: j2objc-annotations-1.3.jar
[debug] Including: proto-google-common-protos-2.0.1.jar
[debug] Including: checker-qual-3.12.0.jar
[debug] Including: animal-sniffer-annotations-1.19.jar
[debug] Including: protobuf-java-3.21.1.jar
[debug] Including: failureaccess-1.0.1.jar
[debug] Including: pmml-model-1.4.8.jar
[debug] Including: guava-31.0.1-jre.jar
[debug] Including: grpc-context-1.47.0.jar
[debug] Including: scala-library-2.12.17.jar
[debug] Including: spark-connect_2.12-3.4.0-SNAPSHOT.jar
[debug] Including: grpc-netty-shaded-1.47.0.jar
[debug] Including: unused-1.0.0.jar
[debug] Including: grpc-core-1.47.0.jar
[debug] Including: grpc-protobuf-lite-1.47.0.jar
[debug] Including: gson-2.9.0.jar
[debug] Including: grpc-protobuf-1.47.0.jar
[debug] Including: grpc-services-1.47.0.jar
[debug] Including: grpc-stub-1.47.0.jar

@grundprinzip
Copy link
Contributor

Thanks for doing this!

@LuciferYang
Copy link
Contributor Author

rebase

@HyukjinKwon HyukjinKwon marked this pull request as ready for review October 10, 2022 04:34
val cp = (assembly / fullClasspath).value
cp filter { v =>
val name = v.data.getName
name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scala-collection picked up from Scala library itself? I remember there's an option to exclude this (e.g., includeScala = False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be (assembly / assemblyPackageScala / assembleArtifact) := false, but I found scala-collection was not excluded, let me try

assembly / assemblyOption ~= {
      _.withIncludeScala(false)
    },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@HyukjinKwon The results are the same, from the document, it should be this option, but it only excludes the scala-library, not the 'scala-collection compat_'.

@HyukjinKwon
Copy link
Member

LGTM otherwise. Mind filling the PR description?

ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.guava.@1").inAll,
ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.guava.@1").inAll,
ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.android_annotation.@1").inAll,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit: org.sparkproject -> org.apache.spark.connect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we should follow the existing rules, spark.shade.packageName defined as org.sparkproject on May 8, 2019. If we need to change this rule, it is better to change it uniformly by an independent pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see it now.

cp filter { v =>
val name = v.data.getName
name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
Copy link
Contributor Author

@LuciferYang LuciferYang Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon should we share Netty with Spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need merge #38185 first, otherwise, netty cannot be filtered out here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need rebase after #38185 merged

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, LGTM

@LuciferYang
Copy link
Contributor Author

rebased

@LuciferYang
Copy link
Contributor Author

@HyukjinKwon
Copy link
Member

Merged to master.

@LuciferYang
Copy link
Contributor Author

thanks @HyukjinKwon @amaliujia @grundprinzip

DeZepTup pushed a commit to DeZepTup/spark-custom that referenced this pull request Oct 31, 2022
…ules

This main change of this pr is refactor shade relocation/rename rules refer to result of `mvn dependency:tree -pl connector/connect` to
ensure that maven and sbt produce assembly jar according to the same rules.

The main parts of `mvn dependency:tree -pl connector/connect` result as follows:

```
[INFO] +- com.google.guava:guava:jar:31.0.1-jre:compile
[INFO] |  +- com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:3.12.0:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.7.1:compile
[INFO] |  \- com.google.j2objc:j2objc-annotations:jar:1.3:compile
[INFO] +- com.google.guava:failureaccess:jar:1.0.1:compile
[INFO] +- com.google.protobuf:protobuf-java:jar:3.21.1:compile
[INFO] +- io.grpc:grpc-netty:jar:1.47.0:compile
[INFO] |  +- io.grpc:grpc-core:jar:1.47.0:compile
[INFO] |  |  +- com.google.code.gson:gson:jar:2.9.0:runtime
[INFO] |  |  +- com.google.android:annotations:jar:4.1.1.4:runtime
[INFO] |  |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.19:runtime
[INFO] |  +- io.netty:netty-codec-http2:jar:4.1.72.Final:compile
[INFO] |  |  \- io.netty:netty-codec-http:jar:4.1.72.Final:compile
[INFO] |  +- io.netty:netty-handler-proxy:jar:4.1.72.Final:runtime
[INFO] |  |  \- io.netty:netty-codec-socks:jar:4.1.72.Final:runtime
[INFO] |  +- io.perfmark:perfmark-api:jar:0.25.0:runtime
[INFO] |  \- io.netty:netty-transport-native-unix-common:jar:4.1.72.Final:runtime
[INFO] +- io.grpc:grpc-protobuf:jar:1.47.0:compile
[INFO] |  +- io.grpc:grpc-api:jar:1.47.0:compile
[INFO] |  |  \- io.grpc:grpc-context:jar:1.47.0:compile
[INFO] |  +- com.google.api.grpc:proto-google-common-protos:jar:2.0.1:compile
[INFO] |  \- io.grpc:grpc-protobuf-lite:jar:1.47.0:compile
[INFO] +- io.grpc:grpc-services:jar:1.47.0:compile
[INFO] |  \- com.google.protobuf:protobuf-java-util:jar:3.19.2:runtime
[INFO] +- io.grpc:grpc-stub:jar:1.47.0:compile
[INFO] +- org.spark-project.spark:unused:jar:1.0.0:compile
```

The new shade rule excludes the following jar packages:

- scala related jars
- netty related jars
- only sbt inlcude jars before: pmml-model-*.jar, findbugs jsr305-*.jar, spark unused-1.0.0.jar

So after this pr

maven shade will includes the following jars:

```
[INFO] --- maven-shade-plugin:3.2.4:shade (default)  spark-connect_2.12 ---
[INFO] Including com.google.guava:guava:jar:31.0.1-jre in the shaded jar.
[INFO] Including com.google.guava:listenablefuture:jar:9999.0-empty-to-avoid-conflict-with-guava in the shaded jar.
[INFO] Including org.checkerframework:checker-qual:jar:3.12.0 in the shaded jar.
[INFO] Including com.google.errorprone:error_prone_annotations:jar:2.7.1 in the shaded jar.
[INFO] Including com.google.j2objc:j2objc-annotations:jar:1.3 in the shaded jar.
[INFO] Including com.google.guava:failureaccess:jar:1.0.1 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java:jar:3.21.1 in the shaded jar.
[INFO] Including io.grpc:grpc-netty:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-core:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.code.gson:gson:jar:2.9.0 in the shaded jar.
[INFO] Including com.google.android:annotations:jar:4.1.1.4 in the shaded jar.
[INFO] Including org.codehaus.mojo:animal-sniffer-annotations:jar:1.19 in the shaded jar.
[INFO] Including io.perfmark:perfmark-api:jar:0.25.0 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-api:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-context:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.api.grpc:proto-google-common-protos:jar:2.0.1 in the shaded jar.
[INFO] Including io.grpc:grpc-protobuf-lite:jar:1.47.0 in the shaded jar.
[INFO] Including io.grpc:grpc-services:jar:1.47.0 in the shaded jar.
[INFO] Including com.google.protobuf:protobuf-java-util:jar:3.19.2 in the shaded jar.
[INFO] Including io.grpc:grpc-stub:jar:1.47.0 in the shaded jar.
```

sbt assembly will include the following jars:

```
[debug] Including from cache: j2objc-annotations-1.3.jar
[debug] Including from cache: guava-31.0.1-jre.jar
[debug] Including from cache: protobuf-java-3.21.1.jar
[debug] Including from cache: grpc-services-1.47.0.jar
[debug] Including from cache: failureaccess-1.0.1.jar
[debug] Including from cache: grpc-stub-1.47.0.jar
[debug] Including from cache: perfmark-api-0.25.0.jar
[debug] Including from cache: annotations-4.1.1.4.jar
[debug] Including from cache: listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
[debug] Including from cache: animal-sniffer-annotations-1.19.jar
[debug] Including from cache: checker-qual-3.12.0.jar
[debug] Including from cache: grpc-netty-1.47.0.jar
[debug] Including from cache: grpc-api-1.47.0.jar
[debug] Including from cache: grpc-protobuf-lite-1.47.0.jar
[debug] Including from cache: grpc-protobuf-1.47.0.jar
[debug] Including from cache: grpc-context-1.47.0.jar
[debug] Including from cache: grpc-core-1.47.0.jar
[debug] Including from cache: protobuf-java-util-3.19.2.jar
[debug] Including from cache: error_prone_annotations-2.10.0.jar
[debug] Including from cache: gson-2.9.0.jar
[debug] Including from cache: proto-google-common-protos-2.0.1.jar
```

All the dependencies mentioned above are relocationed to the `org.sparkproject.connect` package according to the new rules to avoid conflicts with other third-party dependencies.

Refactor shade relocation/rename rules to ensure that maven and sbt produce assembly jar according to the same rules.

No

Pass GitHub Actions

Closes apache#38162 from LuciferYang/SPARK-40677-FOLLOWUP.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants