Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25956] Make Scala 2.12 as default Scala version in Spark 3.0 #22967

Closed
wants to merge 3 commits into from

Conversation

dbtsai
Copy link
Member

@dbtsai dbtsai commented Nov 7, 2018

What changes were proposed in this pull request?

This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

How was this patch tested?

existing tests

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98571 has finished for PR 22967 at commit eb10e5a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 8, 2018

Ur, it seems due to 63ca4bb. I'll the master branch.

@dbtsai
Copy link
Member Author

dbtsai commented Nov 8, 2018

@dongjoon-hyun Yeah, seems 63ca4bb breaks the Scala 2.12 build.

I'll re-trigger the build once Scala 2.12 build is fixed.

Thanks.

@dbtsai
Copy link
Member Author

dbtsai commented Nov 8, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98579 has finished for PR 22967 at commit eb10e5a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98580 has finished for PR 22967 at commit 2eea387.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

docs/sparkr.md Outdated
@@ -133,7 +133,7 @@ specifying `--packages` with `spark-submit` or `sparkR` commands, or if initiali

<div data-lang="r" markdown="1">
{% highlight r %}
sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
sparkR.session()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, @dbtsai, I think you can just switch this to other datasources like spark-redshift or spark-xml, and fix the description above you can find data source connectors for popular file formats like Avro.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with R. Can you elaborate? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, but the problem is other packages probably wouldn't have _2.12 distribution. hm, I think this can be left as was for now.

At least I am going to release spark-xml before Spark 3.0.0 anyway. I can try to include 2.12 distribution as well and fix it here later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean it's related with using external package, it looks so but Avro is kind of internal source now .. so it's out of date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get you.

BTW, are you familiar with Mima? I still can not figure out why it's still failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought com.databricks:spark-avro_2.12 is deprecated and no longer exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be anything. Rather than make this an example that requires maintenance, why not just change the surrounding text to not necessarily refer to an Avro connector, and make this a dummy package like com.acme:spark-foo_2.12 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!
(although, let's not use spark here - don't want to encourage naming packages with spark in the name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a text search and replacement, and I didn't read the context of having sparkPackages = "com.databricks:spark-avro_2.11:3.0.0" here. My bad.

Although avro is now part of spark codebase, but it's in external package which is not in the classpath by default. How about I change it to sparkPackages = "org.apache.spark:spark-avro_2.12:3.0.0" here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine. A dummy package is also fine.

@@ -154,8 +154,8 @@
<commons.math3.version>3.4.1</commons.math3.version>
<!-- managed up from 3.2.1 for SPARK-11652 -->
<commons.collections.version>3.2.2</commons.collections.version>
<scala.version>2.11.12</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<scala.version>2.12.7</scala.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of the profile scala-2.11 needs to change to set these back to 2.11 versions. The scala-2.12 profile doesn't need to set these now, then. You can keep the enforcer rule in the 2.12 profile to ban 2.11 dependencies. Actually, the line banning 2.10 dependencies can be removed as that's already set in the main build's config.

@@ -1998,7 +1998,7 @@
-->
<exclude>org.jboss.netty</exclude>
<exclude>org.codehaus.groovy</exclude>
<exclude>*:*_2.10</exclude>
<exclude>*:*_2.11</exclude>
</excludes>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Can you take a look if this looks right now? Thanks!

@SparkQA
Copy link

SparkQA commented Nov 8, 2018

Test build #98620 has finished for PR 22967 at commit c4b6bed.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2717,7 +2717,6 @@
<rules>
<bannedDependencies>
<excludes combine.children="append">
<exclude>*:*_2.11</exclude>
<exclude>*:*_2.10</exclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite -- the rule has to stay on the 2.12 profile, because it's the one that needs to exclude _2.11 dependencies. The exclusion for _2.10 is already in the parent rule and can be removed. But that's the only change here.
Yes the version changes look right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. I made the parent rule to exclude 2.10, and moved the exclusion of 2.11 to 2.12 profile. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that looks right to me.

@SparkQA
Copy link

SparkQA commented Nov 11, 2018

Test build #98709 has finished for PR 22967 at commit b4d2d4f.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member Author

dbtsai commented Nov 11, 2018

Waiting #22977 to be merged, and I'll rebase from it and fix the remaining binary incompatibilities.

pom.xml Outdated
@@ -2718,7 +2710,6 @@
<bannedDependencies>
<excludes combine.children="append">
<exclude>*:*_2.11</exclude>
<exclude>*:*_2.10</exclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbtsai sorry for the late idea here -- this isn't essential for the change, and you don't have to make it here -- but I thought of a better way. Really we want the default maven-enforcer-plugin config above to exclude _2.10 and _2.11 dependencies, and remove everything from the scala-2.12 profile (or else, one still has to enable the profile to get all Scala 2.12 config). Then, move this maven-enforcer-plugin config to the scala-2.11 profile. That copy should only exclude _2.10 dependencies. However to make sure Maven doesn't also add that to the _2.11 exclusion rule in the parent, the combine.children="append" attribute here can become combine.self="override". That should get the desired effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, and I agree this will make the default scala 2.12 profile cleaner.

@SparkQA
Copy link

SparkQA commented Nov 13, 2018

Test build #98756 has finished for PR 22967 at commit 52dc4a1.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 13, 2018

Test build #98786 has finished for PR 22967 at commit 52dc4a1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 14, 2018

Test build #98807 has finished for PR 22967 at commit 52dc4a1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Verified

This commit was signed with the committer’s verified signature.
dbtsai DB Tsai

Verified

This commit was signed with the committer’s verified signature.
dbtsai DB Tsai
@dbtsai
Copy link
Member Author

dbtsai commented Nov 14, 2018

@dongjoon-hyun thanks for trigging the build. The python test script was only looking for scala 2.11 jars resulting python test failures. I just fixed it in the latest push. Let's see how it goes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2018

Test build #98813 has finished for PR 22967 at commit 8b6b474.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 14, 2018

Test build #98812 has finished for PR 22967 at commit a5cc336.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member Author

dbtsai commented Nov 14, 2018

Retest this please.

@SparkQA
Copy link

SparkQA commented Nov 14, 2018

Test build #98815 has finished for PR 22967 at commit 8b6b474.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2717,7 +2717,6 @@
<rules>
<bannedDependencies>
<excludes combine.children="append">
<exclude>*:*_2.11</exclude>
<exclude>*:*_2.10</exclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that looks right to me.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @dbtsai and all!

@dongjoon-hyun
Copy link
Member

Could you take a look at this once more, @HyukjinKwon and @felixcheung , @gatorsmile ?

@dongjoon-hyun
Copy link
Member

Hi, @shaneknapp.
If we make Scala 2.11 as our default, can we reuse the following Job for Scala 2.11 instead?

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you all!

@asfgit asfgit closed this in ad853c5 Nov 15, 2018
@shaneknapp
Copy link
Contributor

@dongjoon-hyun not a problem, i'll need to update the build config(s).... what branches will need 2.12 vs 2.11?

@dongjoon-hyun
Copy link
Member

Thank you, @shaneknapp . Like the current job spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/, master branch will need it. So,
from spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12
to spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.11?

@xerial
Copy link

xerial commented Nov 15, 2018

Thank you for all the efforts to make this happen!

Spark has been the last resort before deprecating Scala 2.11.

After Spark 3.0, as an OSS contributor, we can stop maintaining cross builds for Scala 2.11 and can completely migrate to Scala 2.12 or later.

@dbtsai dbtsai deleted the scala2.12 branch November 27, 2018 00:26
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

## How was this patch tested?

existing tests

Closes apache#22967 from dbtsai/scala2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…spark-env.sh

## What changes were proposed in this pull request?

When I try to run scripts (e.g. `start-master.sh`/`start-history-server.sh ` in latest master, I got such error:
```
Presence of build for multiple Scala versions detected.
Either clean one of them or, export SPARK_SCALA_VERSION in spark-env.sh.
```

The error message is quite confusing. Without reading `load-spark-env.sh`,  I didn't know which directory to remove, or where to find and edit the `spark-evn.sh`.

This PR is to make the error message more clear. Also change the script for less maintenance when we add or drop Scala versions in the future.
As now with apache#22967, we can revise the error message as following(in my local setup):

```
Presence of build for multiple Scala versions detected (/Users/gengliangwang/IdeaProjects/spark/assembly/target/scala-2.12 and /Users/gengliangwang/IdeaProjects/spark/assembly/target/scala-2.11).
Remove one of them or, export SPARK_SCALA_VERSION=2.12 in /Users/gengliangwang/IdeaProjects/spark/conf/spark-env.sh.
Visit https://spark.apache.org/docs/latest/configuration.html#environment-variables for more details about setting environment variables in spark-env.sh.
```

## How was this patch tested?

Manual test

Closes apache#23049 from gengliangwang/reviseEnvScript.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
rshkv pushed a commit to palantir/spark that referenced this pull request Jun 18, 2020
This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

existing tests

Closes apache#22967 from dbtsai/scala2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
rshkv pushed a commit to palantir/spark that referenced this pull request Jul 16, 2020
This PR makes Spark's default Scala version as 2.12, and Scala 2.11 will be the alternative version. This implies that Scala 2.12 will be used by our CI builds including pull request builds.

We'll update the Jenkins to include a new compile-only jobs for Scala 2.11 to ensure the code can be still compiled with Scala 2.11.

existing tests

Closes apache#22967 from dbtsai/scala2.12.

Authored-by: DB Tsai <d_tsai@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants