Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 53 additions & 13 deletions connector/connect/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -268,11 +268,13 @@
as assembly build.
-->
<include>com.google.android:annotations</include>
<include>com.google.api.grpc:proto-google-common-proto</include>
<include>com.google.api.grpc:proto-google-common-protos</include>
<include>io.perfmark:perfmark-api</include>
<include>org.codehaus.mojo:animal-sniffer-annotations</include>
<include>com.google.errorprone:error_prone_annotations</include>
<include>com.google.j2objc:j2objc-annotations</include>
<include>org.checkerframework:checker-qual</include>
<include>com.google.code.gson:gson</include>
</includes>
</artifactSet>
<relocations>
Expand Down Expand Up @@ -303,28 +305,66 @@
</relocation>

<relocation>
<pattern>com.google.android</pattern>
<shadedPattern>${spark.shade.packageName}.connect.android</shadedPattern>
<pattern>android.annotation</pattern>
<shadedPattern>${spark.shade.packageName}.connect.android_annotation</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.api.grpc</pattern>
<shadedPattern>${spark.shade.packageName}.connect.api</shadedPattern>
<pattern>io.perfmark</pattern>
<shadedPattern>${spark.shade.packageName}.connect.io_perfmark</shadedPattern>
</relocation>
<relocation>
<pattern>io.perfmark</pattern>
<shadedPattern>${spark.shade.packageName}.connect.perfmark</shadedPattern>
<pattern>org.codehaus.mojo.animal_sniffer</pattern>
<shadedPattern>${spark.shade.packageName}.connect.animal_sniffer</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.j2objc.annotations</pattern>
<shadedPattern>${spark.shade.packageName}.connect.j2objc_annotations</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.errorprone.annotations</pattern>
<shadedPattern>${spark.shade.packageName}.connect.errorprone_annotations</shadedPattern>
</relocation>
<relocation>
<pattern>org.checkerframework</pattern>
<shadedPattern>${spark.shade.packageName}.connect.checkerframework</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.gson</pattern>
<shadedPattern>${spark.shade.packageName}.connect.gson</shadedPattern>
</relocation>

<!--
For `com.google.api.grpc:proto-google-common-protos`, do not directly define pattern
as `common.google`, otherwise, otherwise, the relocation result may be uncertain due
to the change of rule order.
-->
<relocation>
<pattern>com.google.api</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.api</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.cloud</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.cloud</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.geo</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.geo</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.logging</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.logging</shadedPattern>
</relocation>
<relocation>
<pattern>org.codehaus.mojo</pattern>
<shadedPattern>${spark.shade.packageName}.connect.mojo</shadedPattern>
<pattern>com.google.longrunning</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.longrunning</shadedPattern>
</relocation>
<relocation>
<pattern>com.google.errorprone</pattern>
<shadedPattern>${spark.shade.packageName}.connect.errorprone</shadedPattern>
<pattern>com.google.rpc</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.rpc</shadedPattern>
</relocation>
<relocation>
<pattern>com.com.google.j2objc</pattern>
<shadedPattern>${spark.shade.packageName}.connect.j2objc</shadedPattern>
<pattern>com.google.type</pattern>
<shadedPattern>${spark.shade.packageName}.connect.google_protos.type</shadedPattern>
</relocation>
</relocations>
</configuration>
Expand Down
30 changes: 29 additions & 1 deletion project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -655,19 +655,47 @@ object SparkConnect {

(assembly / logLevel) := Level.Info,

// Exclude `scala-library` from assembly.
(assembly / assemblyPackageScala / assembleArtifact) := false,

// Exclude `pmml-model-*.jar`, `scala-collection-compat_*.jar`,`jsr305-*.jar` and
// `netty-*.jar` and `unused-1.0.0.jar` from assembly.
(assembly / assemblyExcludedJars) := {
val cp = (assembly / fullClasspath).value
cp filter { v =>
val name = v.data.getName
name.startsWith("pmml-model-") || name.startsWith("scala-collection-compat_") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is scala-collection picked up from Scala library itself? I remember there's an option to exclude this (e.g., includeScala = False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be (assembly / assemblyPackageScala / assembleArtifact) := false, but I found scala-collection was not excluded, let me try

assembly / assemblyOption ~= {
      _.withIncludeScala(false)
    },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@HyukjinKwon The results are the same, from the document, it should be this option, but it only excludes the scala-library, not the 'scala-collection compat_'.

name.startsWith("jsr305-") || name.startsWith("netty-") || name == "unused-1.0.0.jar"
Copy link
Contributor Author

@LuciferYang LuciferYang Oct 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon should we share Netty with Spark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need merge #38185 first, otherwise, netty cannot be filtered out here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need rebase after #38185 merged

}
},

(assembly / assemblyShadeRules) := Seq(
ShadeRule.rename("io.grpc.**" -> "org.sparkproject.connect.grpc.@0").inAll,
ShadeRule.rename("com.google.common.**" -> "org.sparkproject.connect.guava.@1").inAll,
ShadeRule.rename("com.google.thirdparty.**" -> "org.sparkproject.connect.guava.@1").inAll,
ShadeRule.rename("com.google.protobuf.**" -> "org.sparkproject.connect.protobuf.@1").inAll,
ShadeRule.rename("android.annotation.**" -> "org.sparkproject.connect.android_annotation.@1").inAll,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit: org.sparkproject -> org.apache.spark.connect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think we should follow the existing rules, spark.shade.packageName defined as org.sparkproject on May 8, 2019. If we need to change this rule, it is better to change it uniformly by an independent pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see it now.

ShadeRule.rename("io.perfmark.**" -> "org.sparkproject.connect.io_perfmark.@1").inAll,
ShadeRule.rename("org.codehaus.mojo.animal_sniffer.**" -> "org.sparkproject.connect.animal_sniffer.@1").inAll,
ShadeRule.rename("com.google.j2objc.annotations.**" -> "org.sparkproject.connect.j2objc_annotations.@1").inAll,
ShadeRule.rename("com.google.errorprone.annotations.**" -> "org.sparkproject.connect.errorprone_annotations.@1").inAll,
ShadeRule.rename("org.checkerframework.**" -> "org.sparkproject.connect.checkerframework.@1").inAll,
ShadeRule.rename("com.google.gson.**" -> "org.sparkproject.connect.gson.@1").inAll,
ShadeRule.rename("com.google.api.**" -> "org.sparkproject.connect.google_protos.api.@1").inAll,
ShadeRule.rename("com.google.cloud.**" -> "org.sparkproject.connect.google_protos.cloud.@1").inAll,
ShadeRule.rename("com.google.geo.**" -> "org.sparkproject.connect.google_protos.geo.@1").inAll,
ShadeRule.rename("com.google.logging.**" -> "org.sparkproject.connect.google_protos.logging.@1").inAll,
ShadeRule.rename("com.google.longrunning.**" -> "org.sparkproject.connect.google_protos.longrunning.@1").inAll,
ShadeRule.rename("com.google.rpc.**" -> "org.sparkproject.connect.google_protos.rpc.@1").inAll,
ShadeRule.rename("com.google.type.**" -> "org.sparkproject.connect.google_protos.type.@1").inAll
),

(assembly / assemblyMergeStrategy) := {
case m if m.toLowerCase(Locale.ROOT).endsWith("manifest.mf") => MergeStrategy.discard
// Drop all proto files that are not needed as artifacts of the build.
case m if m.toLowerCase(Locale.ROOT).endsWith(".proto") => MergeStrategy.discard
case _ => MergeStrategy.first
},
}
)
}

Expand Down