[SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests #39712

zhenlineo · 2023-01-24T01:40:21Z

What changes were proposed in this pull request?

The Spark Connect Scala Client should provide the same API as the existing SQL API. This PR adds the tests to ensure the generated binaries of two modules are compatible using MiMa.
The covered APIs are:

Dataset,
SparkSession with all implemented methods,
Column with all implemented methods,
DataFrame

Why are the changes needed?

Ensures the binary compatibility of the two APIs.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Integration tests.

Note: This PR need to be merged into 3.4 too.

AmplabJenkins · 2023-01-24T16:00:37Z

Can one of the admins verify this patch?

HyukjinKwon · 2023-01-25T00:23:35Z

...nnect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala

Let's probably file a JIRA

https://issues.apache.org/jira/browse/SPARK-42175 This was skipped as I do not want to include too much API impl with the compatibility test PR.

Let's add this jira id to the TODO, like
TODO(SPARK-42175): Add the Dataset object definition

HyukjinKwon · 2023-01-25T00:23:51Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala

Seems like we're not using this Logging

The logging is needed for the binary compatibility: class type shall be exactly the same.

spark/sql/core/src/main/scala/org/apache/spark/sql/Column.scala

Line 142 in 9734998

class Column(val expr: Expression) extends Logging {

should we delete private[sql] here?

My poor Scala knowledge indicate this only marks one constructor private. The intension is to mark the current constructor private. For more constructor methods, we will add in follow up PRs.

Hmm....why is it not consistent with spark.sql.Column?

Our type is proto.Expression, it is not the same as Expression. I leave it to later PRs to decide how to support Expression.

I mean, why not

class Column(val expr: proto.Expression) extends Logging { ...

Because I am not certain if we should expose constructor this(expr: proto.Expression) and val expr:proto.Expression.
They are not the same type as this(expr: Expression) and val expr: Expression.

Our type proto.Expression is some grpc class and Expression is in sql package. They are different types from the binary code point of view.

Let's keep it private[sql] for now.

HyukjinKwon · 2023-01-25T00:26:31Z

connector/connect/client/jvm/pom.xml

Can we use SBT to check this instead of Maven? We have one place for MiMa so far in SBT (See also project/MimaBuild.scala, and dev/mima)

The SBT MiMa check has some limitations to run as a SBT rule:
It is the best for a stable API. e.g. current vs previous. It is not very friendly to configure to test e.g. scala-client vs sql while we are actively working on the scala-client API.
To be more specific, the problems I hit were:

I cannot configure the MiMa rule to find the current SQL SNAPSHOT jar.

I cannot use ClassLoader correctly in the SBT rule to load all methods in the client API.

As a result, I end up this test where we have more freedom to grow the API test coverage with the client API.

Gotya. Let's probably add a couple of comments here and there to make it clear .. I am sure this is confusing to other developers.

cc @dongjoon-hyun , also cc @pan3793 Do you have any suggestions for this?

You can check out the MiMa SBT impl I did here: zhenlineo#6
I marked the two problems in the PR code. Unless we can fix the two problems. I do not feel it is a better solution than this PR: calling MiMa directly in a test.

LuciferYang · 2023-01-26T12:30:06Z

connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Column.scala

Both Column and Column$ are private[sql] access scope with this pr, so this is not an API for users?

Seem users cannot create a Column in their own package with this pr, for example:

package org.apache.spark.test import org.scalatest.funsuite.AnyFunSuite // scalastyle:ignore funsuite import org.apache.spark.sql.Column class MyTestSuite extends AnyFunSuite // scalastyle:ignore funsuite { test("new column") { val a = Column("a") // Symbol apply is inaccessible from this place val b = new Column(null) // No constructor accessible from here }

I think org.apache.spark.sql.Column#apply was a public api before. If private[sql] is added to object Column, it may require more code refactoring work

Thanks for your inputs.

Looking the current Column class, the SQL API give two public APIs to construct the Column:

class Column(val expr: Expression) extends Logging { def this(name: String) = this(name match { case "*" => UnresolvedStar(None) case _ if name.endsWith(".*") => val parts = UnresolvedAttribute.parseAttributeName(name.substring(0, name.length - 2)) UnresolvedStar(Some(parts)) case _ => UnresolvedAttribute.quotedString(name) }) ...

Right now the client API is very far from completion. We will add new methods in coming PRs. I am sure there will be a Column(name: String) for users to use. But it is out the scope of this PR to include all public constructors needed for the client.

The compatibility check added with this PR will grow the check coverage when more and more methods are added in the client. The current check ensures when a new method are added, the new method should be binary compatible with the existing SQL API. When the client API coverage is up (~80%) we can switch to a more aggressive check to ensure we did not miss any methods by mistake.

OK, I see what you mean. It seems that this is just an intermediate state. So it really doesn't need to consider users' use now.

LuciferYang · 2023-01-26T17:21:57Z

connector/connect/client/jvm/pom.xml

The latest version is 1.1.1

Yes, there is a bug in 1.1.1, where the MiMa will not be able to check the class methods if the object is marked private. Thus I used the same one that our SBT build uses, which is 1.1.0.

LuciferYang · 2023-01-26T17:40:42Z

...nnect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala

Like above TODO, need create a jira and add the corresponding jira id to TODO

LuciferYang · 2023-01-26T17:55:43Z

@zhenlineo Manual test as follows:

gh pr checkout 39712
build/sbt "connect-client-jvm/testOnly CompatibilitySuite"

and test failed

[info] CompatibilitySuite:
[info] - compatibility MiMa tests *** FAILED *** (51 milliseconds)
[info]   java.lang.AssertionError: assertion failed: Failed to find the jar inside folder: /${basedir}/spark-mine/connector/connect/client/jvm/target
[info]   at scala.Predef$.assert(Predef.scala:223)
[info]   at org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:66)
[info]   at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
[info]   at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
[info]   at org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info]   at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Suite.run(Suite.scala:1114)
[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info]   at org.scalatest.funsuite.AnyFunSuite.run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
[info] - compatibility API tests: Dataset *** FAILED *** (22 milliseconds)
[info]   java.lang.AssertionError: assertion failed: Failed to find the jar inside folder: /Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/client/jvm/target
[info]   at scala.Predef$.assert(Predef.scala:223)
[info]   at org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:66)
[info]   at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
[info]   at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
[info]   at org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:103)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info]   at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Suite.run(Suite.scala:1114)
[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info]   at org.scalatest.funsuite.AnyFunSuite.run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:750)
[info] Run completed in 1 second, 234 milliseconds.
[info] Total number of tests run: 2
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 2, canceled 0, ignored 0, pending 0
[info] *** 2 TESTS FAILED ***
[error] Failed tests:
[error] 	org.apache.spark.sql.connect.client.CompatibilitySuite

zhenlineo · 2023-01-26T18:00:08Z

@LuciferYang Thanks so much to look into this PR, I added instructions to run this PR on the top of the test class
See the code here

In short, we need first run sbt package and then run the test. This is a integration test. It needs all the artifacts being built first.

LuciferYang · 2023-01-26T18:07:33Z

Maven test has some problems:

run

gh pr checkout 39712
mvn clean install -DskipTests -pl connector/connect/client/jvm -am
mvn clean test -pl connector/connect/client/jvm -Dtest=none -DwildcardSuites=org.apache.spark.sql.connect.client.CompatibilitySuite

CompatibilitySuite:
- compatibility MiMa tests *** FAILED ***
  java.lang.AssertionError: assertion failed: Failed to find the jar inside folder: /${basedir}/spark-source/connector/connect/client/jvm/target
  at scala.Predef$.assert(Predef.scala:223)
  at org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:66)
  at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
  at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
  at org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$1(CompatibilitySuite.scala:69)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  ...
- compatibility API tests: Dataset *** FAILED ***
  java.lang.AssertionError: assertion failed: Failed to find the jar inside folder: /${basedir}/spark-source/connector/connect/client/jvm/target
  at scala.Predef$.assert(Predef.scala:223)
  at org.apache.spark.sql.connect.client.util.IntegrationTestUtils$.findJar(IntegrationTestUtils.scala:66)
  at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar$lzycompute(CompatibilitySuite.scala:57)
  at org.apache.spark.sql.connect.client.CompatibilitySuite.clientJar(CompatibilitySuite.scala:53)
  at org.apache.spark.sql.connect.client.CompatibilitySuite.$anonfun$new$7(CompatibilitySuite.scala:103)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  ...
Run completed in 209 milliseconds.
Total number of tests run: 2
Suites: completed 2, aborted 0
Tests: succeeded 0, failed 2, canceled 0, ignored 0, pending 0
*** 2 TESTS FAILED ***

GA didn't fail because GA didn't test this connect-client-jvm module? @HyukjinKwon Should we enable the GA test of the connect-client-jvm first before development

LuciferYang · 2023-01-26T18:12:11Z

@zhenlineo So we need package sql module and connect-client-jvm first, then test?

zhenlineo · 2023-01-26T18:15:15Z

@LuciferYang yes. As the test compares the binary jars. Let me know if there is some good way to enforce to build the jars first.

LuciferYang · 2023-01-26T18:21:15Z

local test

build/sbt "sql/package"  
build/sbt "connect-client-jvm/package" 
build/sbt "connect-client-jvm/testOnly *CompatibilitySuite"

The test still failed, did I execute the commands incorrectly? Can you give me a correct?

zhenlineo · 2023-01-26T18:30:18Z

"It worked on my computer." 😂

Did you get the same error? Do you have any jars under the folder/${basedir}/spark-mine/connector/connect/client/jvm/target? Either in SBT target folder or maven target folder? If so what's your jars name and path to it?
IntegrationTestUtils#findJar method is the logic used to search for the target jars.

LuciferYang · 2023-01-26T18:35:40Z

Yes, the same error and the jar package exists:

connector/connect/client/jvm/target/scala-2.12/spark-connect-client-jvm_2.12-3.5.0-SNAPSHOT.jar

zhenlineo · 2023-01-26T18:37:50Z

Oh, you have to do a global sbt package as we need the shaded jar. The shaded jar name has a assembly inside. The reason to use the shaded jar is that's the real jar that we are going to give users to use.

zhenlineo · 2023-01-26T19:31:04Z

@LuciferYang I verified the path for SBT and maven again. All are able to run the test successfully.
The quickest is the following:

sbt package
sbt "connect-client-jvm/testOnly *CompatibilitySuite"

For maven, we can build the two package separately and then run the maven test correctly.

The reason your sbt command did not work is that the command build/sbt "connect-client-jvm/package" does not shade the client jar. When running a global sbt package, a new shaded jar named spark-connect-client-jvm-assembly will be added besides the non shaded jar.

I am all up to improve the test flow if you can advise?

LuciferYang · 2023-01-26T19:36:51Z

local test
build/sbt "sql/package"  
build/sbt "connect-client-jvm/package" 
build/sbt "connect-client-jvm/testOnly *CompatibilitySuite" 
The test still failed, did I execute the commands incorrectly? Can you give me a correct?

Thanks @zhenlineo

when I change build/sbt "connect-client-jvm/package" to build/sbt "connect-client-jvm/assembly", the test can pass.

But I think this is not friendly to developers, for maven users, the full modules build and test can be through mvn clean install before, but now they may need to be build through mvn clean install -DskipTests first, and then test through mvn test without clean, otherwise CompatibilitySuite will fail(I verified this manually).

I don't have any good suggestions now, need more time to think about it.

cc @srowen @dongjoon-hyun @JoshRosen FYI

zhenlineo · 2023-01-27T16:36:12Z

@LuciferYang I have another ClientE2ESuite that also requires a build. How about this: let's merge this integration in and I will investigate SPARK-42215 for a better developer experience. WDYT?

Note to all reviewers: All scala client changes need to be merged into branch-3.4 too.

LuciferYang · 2023-01-27T17:33:12Z

Let's see what others think :)

HyukjinKwon · 2023-01-29T07:39:56Z

I am fine w/ doing it separately. @LuciferYang's suggestion makes a lot of sense to me.

HyukjinKwon

Approving it but let's make sure to address @LuciferYang's suggestions. I do have the same concern about different way of testing which makes other developers hard to test, validate the change, etc.

LuciferYang · 2023-01-29T08:23:48Z

If we are sure that there will be further work to solve the problem, I have no objection to merge it now, but I must stress again that this merge will make mvn clean install without -DskipTests fail(and I think there is no way to only skip the connector/connect/client/jvm module test, maybe I don't know).

zhenlineo · 2023-01-31T22:13:14Z

Hi @LuciferYang, I added the ability to skip the client integration tests with maven:

mvn test -pl connector/connect/client/jvm -DskipJvmClientITs=true

zhenlineo · 2023-02-01T01:52:50Z

@LuciferYang Let me know if there is any other blockers to merge this PR. Thanks a lot for the review.

LuciferYang · 2023-02-01T02:38:03Z

connector/connect/client/jvm/pom.xml

No need to add this profile, We can use -Dtest.exclude.tags=org.apache.spark.sql.connect.client.util.ClientIntegrationTest directly.

LuciferYang · 2023-02-01T02:41:07Z

...client/jvm/src/test/java/org/apache/spark/sql/connect/client/util/ClientIntegrationTest.java

I did similar work a few days ago：
#39768

The test case CompatibilitySuite was not added at that time, so @dongjoon-hyun suggested not to add this Tag

Need @HyukjinKwon or @dongjoon-hyun for double check for this

The test Tag should be uniformly added to the tag module(The current naming rule is ExtendedXXX) and please update the pr description to explain why this Tag is added @zhenlineo

@LuciferYang I've reverted the last commit to skip the e2e tests. As it will not make the build worse (as it doing the same as E2E suite). Let's fix the build issue do it in another PR. We might have better solutions.

This reverts commit 3f27f63.

LuciferYang · 2023-02-01T15:30:58Z

...lient/jvm/src/test/scala/org/apache/spark/sql/connect/client/util/IntegrationTestUtils.scala

+        f.getName.startsWith(sbtName) && f.getName.endsWith(".jar")) ||
+      // Maven Jar
+      (f.getParent.endsWith("target") &&
+        f.getName.startsWith(mvnName) && f.getName.endsWith(".jar"))


I fixed a bad case for maven find Jar in #39810

Thanks for the fix. Added back.

LuciferYang

LGTM(pending CI)

hvanhovell

LGTM

### What changes were proposed in this pull request? The Spark Connect Scala Client should provide the same API as the existing SQL API. This PR adds the tests to ensure the generated binaries of two modules are compatible using MiMa. The covered APIs are: * `Dataset`, * `SparkSession` with all implemented methods, * `Column` with all implemented methods, * `DataFrame` ### Why are the changes needed? Ensures the binary compatibility of the two APIs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Integration tests. Note: This PR need to be merged into 3.4 too. Closes #39712 from zhenlineo/cp-test. Authored-by: Zhen Li <zhenlineo@users.noreply.github.com> Signed-off-by: Herman van Hovell <herman@databricks.com> (cherry picked from commit 15971a0) Signed-off-by: Herman van Hovell <herman@databricks.com>

LuciferYang · 2023-02-27T08:58:42Z

...nnect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CompatibilitySuite.scala

+    val clientClassLoader: URLClassLoader = new URLClassLoader(Seq(clientJar.toURI.toURL).toArray)
+    val sqlClassLoader: URLClassLoader = new URLClassLoader(Seq(sqlJar.toURI.toURL).toArray)
+
+    val clientClass = clientClassLoader.loadClass("org.apache.spark.sql.Dataset")


HI ~ @zhenlineo @HyukjinKwon , there may be some problems with this test case, I add some logs as follows:
https://github.com/apache/spark/compare/master...LuciferYang:spark:CompatibilitySuite?expand=1

From the log, I found that both clientClass and sqlClass are loaded from file:/home/runner/work/spark/spark/connector/connect/client/jvm/target/scala-2.12/spark-connect-client-jvm_2.12-3.5.0-SNAPSHOT.jar, and the contents of newMethods and oldMethods are the same ...

https://pipelines.actions.githubusercontent.com/serviceHosts/c184045e-b556-4e78-b8ef-fb37b2eda9a3/_apis/pipelines/1/runs/62963/signedlogcontent/14?urlExpires=2023-02-27T08%3A53%3A13.6716136Z&urlSigningMethod=HMACV1&urlSignature=XkRKqix4ZapzEeczn7ZhWpAFhSnwWW74UX%2BKUhocftc%3D

At present, using this way to check, at least 4 apis should be incompatible:

private[sql] def withResult[E](f: SparkResult => E): E def collectResult(): SparkResult private[sql] def analyze: proto.AnalyzePlanResponse private[sql] val plan: proto.Plan

Because when using Java reflection, the above 4 methods will be identified as public apis, even though three of them are private [sql], and these four apis do not exist in the Dataset of the sql module:

public java.lang.Object org.apache.spark.sql.Dataset.withResult(scala.Function1)$ public org.apache.spark.sql.connect.client.SparkResult org.apache.spark.sql.Dataset.collectResult()$ public org.apache.spark.connect.proto.AnalyzePlanResponse org.apache.spark.sql.Dataset.analyze()$ public org.apache.spark.connect.proto.Plan org.apache.spark.sql.Dataset.plan()$

also cc @hvanhovell

Thanks so much for looking into this. The dataset test is not as important as Mima test. I will check if we can fix the issue you found. Otherwise it should be safe to delete as the test is already covered by mima.

Thanks @zhenlineo, If it has been covered, we can delete it :)

Thank you for the investigation, @LuciferYang .

### What changes were proposed in this pull request? The Spark Connect Scala Client should provide the same API as the existing SQL API. This PR adds the tests to ensure the generated binaries of two modules are compatible using MiMa. The covered APIs are: * `Dataset`, * `SparkSession` with all implemented methods, * `Column` with all implemented methods, * `DataFrame` ### Why are the changes needed? Ensures the binary compatibility of the two APIs. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Integration tests. Note: This PR need to be merged into 3.4 too. Closes apache#39712 from zhenlineo/cp-test. Authored-by: Zhen Li <zhenlineo@users.noreply.github.com> Signed-off-by: Herman van Hovell <herman@databricks.com> (cherry picked from commit 15971a0) Signed-off-by: Herman van Hovell <herman@databricks.com>

github-actions bot added BUILD CONNECT SQL labels Jan 24, 2023

zhenlineo force-pushed the cp-test branch from 96c9577 to 2e86fff Compare January 24, 2023 15:28

zhenlineo force-pushed the cp-test branch 2 times, most recently from 2ed597b to 8a79ffd Compare January 24, 2023 18:57

zhenlineo changed the title ~~[TODO][Connect] Scala Client Mima Compatibility Tests~~ [SPARK-42172][Connect] Scala Client Mima Compatibility Tests Jan 24, 2023

zhenlineo marked this pull request as ready for review January 24, 2023 19:06

zhenlineo force-pushed the cp-test branch from 8a79ffd to 5fc4448 Compare January 24, 2023 23:10

HyukjinKwon changed the title ~~[SPARK-42172][Connect] Scala Client Mima Compatibility Tests~~ [SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests Jan 25, 2023

HyukjinKwon reviewed Jan 25, 2023

View reviewed changes

zhenlineo force-pushed the cp-test branch 2 times, most recently from be1954c to 8f30d76 Compare January 25, 2023 23:40

LuciferYang reviewed Jan 26, 2023

View reviewed changes

HyukjinKwon approved these changes Jan 29, 2023

View reviewed changes

zhenlineo force-pushed the cp-test branch from 92377bf to b4361c5 Compare January 31, 2023 22:00

github-actions bot added the DOCS label Jan 31, 2023

zhenlineo force-pushed the cp-test branch from b4361c5 to dcc3562 Compare January 31, 2023 22:05

LuciferYang reviewed Feb 1, 2023

View reviewed changes

zhenlineo added 5 commits February 1, 2023 07:09

Introduce Compatibility test

decfdd9

More comments

b852f36

Add Jira follow up ticket number

6299f9a

Simple command to skip client integration tests

3f27f63

Revert "Simple command to skip client integration tests"

23a759c

This reverts commit 3f27f63.

zhenlineo force-pushed the cp-test branch from dcc3562 to 23a759c Compare February 1, 2023 15:16

github-actions bot removed the DOCS label Feb 1, 2023

LuciferYang reviewed Feb 1, 2023

View reviewed changes

Fix

ea30f6a

LuciferYang approved these changes Feb 1, 2023

View reviewed changes

hvanhovell approved these changes Feb 2, 2023

View reviewed changes

hvanhovell closed this in 15971a0 Feb 2, 2023

LuciferYang reviewed Feb 27, 2023

View reviewed changes

This was referenced Feb 28, 2023

[SPARK-42599][CONNECT][INFRA] Introduce dev/connect-jvm-client-mima-check instead of CompatibilitySuite #40191

Closed

[SPARK-42599][CONNECT][INFRA] Introduce dev/connect-jvm-client-mima-check instead of CompatibilitySuite #40213

Closed

[SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests #39712

[SPARK-42172][CONNECT] Scala Client Mima Compatibility Tests #39712

Uh oh!

Conversation

zhenlineo commented Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jan 24, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhenlineo Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhenlineo Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Jan 26, 2023

Uh oh!

zhenlineo commented Jan 26, 2023

Uh oh!

LuciferYang commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuciferYang commented Jan 26, 2023

Uh oh!

zhenlineo commented Jan 26, 2023

Uh oh!

zhenlineo commented Jan 24, 2023 •

edited

Loading

LuciferYang Jan 26, 2023 •

edited

Loading

zhenlineo Jan 25, 2023 •

edited

Loading

zhenlineo Jan 25, 2023 •

edited

Loading

LuciferYang Jan 26, 2023 •

edited

Loading

LuciferYang commented Jan 26, 2023 •

edited

Loading

zhenlineo commented Jan 26, 2023 •

edited

Loading

LuciferYang commented Jan 26, 2023 •

edited

Loading

LuciferYang Feb 1, 2023 •

edited

Loading

LuciferYang Feb 1, 2023 •

edited

Loading