[SPARK-40331][DOCS] Recommend use Java 11/17 as the runtime environment of Spark #37799

LuciferYang · 2022-09-05T06:12:48Z

What changes were proposed in this pull request?

This PR suggests using Java 11/17 as runtime of Spark in the index.md.

Why are the changes needed?

The JIT optimizer in Java 11/17 is better than Java 8, running Spark using Java 11/17 will bring some performance benefits.

Spark uses Whole Stage Code Gen to improve performance, but the generated code may not be friendly to the JIT optimizer. For example, the case mentioned in SPARK-40303 by @wangyum :

  runBenchmark("Benchmark count distinct") {
    withTempPath { dir =>
      val N = 2000000
      val columns = Range(0, 100).map(i => s"id % $i AS id$i")

      spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir.getCanonicalPath)

      Seq(1, 2, 5, 10, 15, 25, 30, 35, 40, 50, 60, 70, 80, 90, 100).foreach { cnt =>
        val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => s"count(distinct $c)")

        val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = 1)
        benchmark.addCase(s"$cnt count distinct with codegen") { _ =>
          withSQLConf(
            "spark.sql.codegen.wholeStage" -> "true",
            "spark.sql.codegen.factoryMode" -> "FALLBACK") {
            spark.read.parquet(dir.getCanonicalPath).selectExpr(selectExps: _*)
              .write.format("noop").mode("Overwrite").save()
          }
        }

        benchmark.addCase(s"$cnt count distinct without codegen") { _ =>
          withSQLConf(
            "spark.sql.codegen.wholeStage" -> "false",
            "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") {
            spark.read.parquet(dir.getCanonicalPath).selectExpr(selectExps: _*)
              .write.format("noop").mode("Overwrite").save()
          }
        }
        benchmark.run()
      }
    }
  }

When use Java 8 to run the above case in local[2] mode, there will be obvious negative effects when cnt is 35, 40, 50, 60 or 70 with codegen enabled. (the performance after cnt > 70 is meet expectations due to fallback with InternalCompilerException: Code grows beyond 64 KB):

OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
35 count distinct with codegen                   418321         418321           0          0.0      209160.3       1.0X
35 count distinct without codegen                 69975          69975           0          0.0       34987.7       6.0X

OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
40 count distinct with codegen                   626627         626627           0          0.0      313313.7       1.0X
40 count distinct without codegen                 90564          90564           0          0.0       45281.9       6.9X

OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
50 count distinct with codegen                   906749         906749           0          0.0      453374.7       1.0X
50 count distinct without codegen                140278         140278           0          0.0       70138.8       6.5X

OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
60 count distinct with codegen                   995616         995616           0          0.0      497808.2       1.0X
60 count distinct without codegen                215088         215088           0          0.0      107544.2       4.6X

OpenJDK 64-Bit Server VM 1.8.0_345-b01 on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
70 count distinct with codegen                  1060273        1060273           0          0.0      530136.4       1.0X
70 count distinct without codegen                290576         290576           0          0.0      145287.9       3.6X

But run the above cases using Java 11 or Java 17, there will be no obvious negative effects:

Java 11

OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
35 count distinct with codegen                    54733          54733           0          0.0       27366.6       1.0X
35 count distinct without codegen                 97613          97613           0          0.0       48806.6       0.6X

OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
40 count distinct with codegen                    70454          70454           0          0.0       35226.8       1.0X
40 count distinct without codegen                127975         127975           0          0.0       63987.3       0.6X


OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
ntel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
50 count distinct with codegen                   104559         104559           0          0.0       52279.7       1.0X
50 count distinct without codegen                182331         182331           0          0.0       91165.5       0.6X

OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
60 count distinct with codegen                   146774         146774           0          0.0       73386.8       1.0X
60 count distinct without codegen                257184         257184           0          0.0      128592.1       0.6X

OpenJDK 64-Bit Server VM 11.0.16+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
70 count distinct with codegen                   190883         190883           0          0.0       95441.4       1.0X
70 count distinct without codegen                346295         346295           0          0.0      173147.4       0.6X

Java 17

OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
35 count distinct with codegen                    45439          45439           0          0.0       22719.5       1.0X
35 count distinct without codegen                 77241          77241           0          0.0       38620.5       0.6X

OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
40 count distinct with codegen                    58347          58347           0          0.0       29173.3       1.0X
40 count distinct without codegen                 98928          98928           0          0.0       49464.1       0.6X


OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
50 count distinct with codegen                    85589          85589           0          0.0       42794.7       1.0X
50 count distinct without codegen                151642         151642           0          0.0       75820.9       0.6X

OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
60 count distinct with codegen                   118545         118545           0          0.0       59272.5       1.0X
60 count distinct without codegen                234024         234024           0          0.0      117012.2       0.5X

OpenJDK 64-Bit Server VM 17.0.4+8-LTS on Linux 5.15.0-1017-azure
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Benchmark count distinct:                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
70 count distinct with codegen                   157151         157151           0          0.0       78575.7       1.0X
70 count distinct without codegen                319582         319582           0          0.0      159791.2       0.5X

After turning on the -XX:+PrintCompilation option, we can found the logs related to the operatorName_doConsume method compilation failure of the C2 compiler:

COMPILE SKIPPED: unsupported incoming calling sequence

or 

COMPILE SKIPPED: unsupported calling sequence

When using Java 8, they are identified as not retryable, this indicates that the compiler deemed this method should not be attempted to compile again on any tier of compilation, and because this is an OSR compilation (i.e. loop compilation), this will mark the method as "never try to perform OSR compilation again on all tiers". But when using Java 11/17 they are identified as retry at different tier, it'll try tier 1, this also makes the performance of the above cases seem acceptable.

So this pr started to suggest using Java 11/17 as runtime environment of Spark in the document.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Just change the document.

LuciferYang · 2022-09-05T06:19:25Z

In addition, using Java 11(with numa bind of cpu and memory ) on the arm architecture will also have better performance than Java 8(with numa bind of cpu and memory ), with a general performance improvement of about 5% ~ 10%

docs/index.md

wangyum · 2022-09-05T09:09:18Z

This is a real use case with 50 count distinct:
Java 8 with whole stage codegen:

Java 8 with whole stage codegen and disable TieredCompilation(-XX:-TieredCompilation):

Java 8 with whole stage codegen and validParamLength = 48(set spark.sql.CodeGenerator.validParamLength=48):

Java 11:

Java 17:

wangyum · 2022-09-05T09:25:31Z

@rednaxelafx @cloud-fan @srowen @HyukjinKwon @dongjoon-hyun

HyukjinKwon · 2022-09-06T01:32:21Z

cc @rednaxelafx FYI

HyukjinKwon · 2022-09-06T01:33:20Z

docs/index.md

 Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS), and it should run on any platform that runs a supported version of Java. This should include JVMs on x86_64 and ARM64. It's easy to run locally on one machine --- all you need is to have `java` installed on your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java installation.

 Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.
+Java 11/17 is the recommended version to run Spark on.


I wonder if we should explicitly mention this though. Some benchmark results weren't good actually IIRC.

I meant that the performance you mentioned looks limited to whole stage codegen and JIT complier only. I remember we saw some slower performance (e.g., at https://github.com/apache/spark/tree/master/core/benchmarks)

For compatibility, is the current code more inclined to the best practice of Java 8? Just as the current use of Scala 2.13 has not reached the best practice. I think it should be improved through some refactoring work, . @rednaxelafx , WDYT?

Yes we should move forward. But we should deprecate Java 8 first which I think would mean that encourage users to use JDK 11 and 17.

What I am unsure is that whether it's better to recommend something (JDK 11/17) slower when the old stuff (JDK 8) is not even deprecated. To end users, JDK 8 is still a faster option in general if I am not mistaken.

+1 for @HyukjinKwon 's comment. Yes, it does. Our benchmark shows JDK 8 is faster in general.

HyukjinKwon · 2022-09-06T01:34:00Z

docs/index.md

 Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.7+ and R 3.5+.
+Java 11/17 is the recommended version to run Spark on.
 Python 3.7 support is deprecated as of Spark 3.4.0.
-Java 8 prior to version 8u201 support is deprecated as of Spark 3.2.0.


Hm, why do we remove this deprecation BTW?

I am ok to keep this line

Yes, we should not remove this, @LuciferYang . Please revert this from this PR.

zhengruifeng · 2022-09-06T01:44:21Z

also cc @Yikun

Yikun · 2022-09-06T03:21:05Z

Some info FYI:

OpenJDK 8 has more longer time before EOL (2026) than OpenJDK 11 (2024), it shows the upstream community attitude of Java 8 in some level.
For some initial aarch64 port: https://openjdk.org/jeps/237 were introduced in Java 9, and Java 8 backport this.
Java 11 have more performance enhancement in aarch64, For some aarch64 improvement like: https://openjdk.org/jeps/315 only be supported in java 11 but java 8 not.
IIRC, I had some discussion the JDK experts on our internal team, OpenJDK 11 contains many experimental feature, and these features are stable on OpenJDK 17, Java 11 might be a transition version.
MacOS java support introduced in Java17. https://openjdk.org/jeps/391

Above all, personally think, unless we see that Java 11 has advantages in most scenarios in Apache Spark, otherwise users should choose the appropriate JDK among 8, 11, 17 according to their own situation (at current time).

LuciferYang · 2022-09-06T03:37:32Z

In addition, using Java 11(with numa bind of cpu and memory ) on the arm architecture will also have better performance than Java 8(with numa bind of cpu and memory ), with a general performance improvement of about 5% ~ 10%

To clarify, this is NUMA bound through the capabilities of the container, not the capabilities of Java 11

dongjoon-hyun

Apache Spark community still supports Hadoop 2.7.4 distribution officially. What do you think about that, @LuciferYang ?

LuciferYang · 2022-09-06T05:15:58Z

Apache Spark community still supports Hadoop 2.7.4 distribution officially. What do you think about that, @LuciferYang ?

Is this question related to the current PR? Or is it a separate question? Is there any bad case with Hadoop 2.7.3 + Java 11? Maybe it's something I don't known.

LuciferYang · 2022-09-06T05:20:06Z

@dongjoon-hyun 20% of SQL type jobs in our production environment use "Spark 3.1 (or Spark 3.2) + Hadoop 2.7.3 + Java 11" on the Arm architecture, and no fatal problems have been found in the last 8 months, but our production environment is not strongly dependent on Hadoop, so I may miss some fatal cases
.

dongjoon-hyun · 2022-09-06T05:42:59Z

Yes it is related because this PR is about a general recommendation for Spark, not a specific distribution.

Is this question related to the current PR?

I want to confirm that we didn't miss any thing there. So, what about Java 17 combination because this PR recommends both 11/17?

LuciferYang · 2022-09-06T06:40:38Z

@dongjoon-hyun Java 17 has not been used in our production environment, but for the scenarios mentioned in the PR description, the performance using 11 and 17 is obviously better than 8. May be @wangyum have some performance data of Java 17 in the production environment that can be shared?

cloud-fan · 2022-09-06T16:11:49Z

Just for curiosity: are we already using java 11 to run tests in Github Actions?

LuciferYang · 2022-09-06T16:47:05Z

Just for curiosity: are we already using java 11 to run tests in Github Actions?

Currently, Java 11/17 only have daily test. For each pr, only build, no test, am I right? @HyukjinKwon

srowen · 2022-09-06T17:26:39Z

If this is at all controversial, let's just not make this change

HyukjinKwon · 2022-09-07T00:04:45Z

Currently, Java 11/17 only have daily test. For each pr, only build, no test, am I right?

yup.

wangyum · 2022-09-21T23:59:03Z

The performance Issue fixed by JDK-8159720.

LuciferYang · 2022-09-22T01:06:35Z

The performance Issue fixed by JDK-8159720.

Seems to be fixed only in Java 9+?

change doc

aa33cdb

github-actions bot added the DOCS label Sep 5, 2022

LuciferYang changed the title ~~[SPARK-40331][DOCS] Recommend use Java 11+ as the runtime of Spark 3.4.0~~ [SPARK-40331][DOCS] Recommend use Java 11+ as the runtime environment of Spark Sep 5, 2022

wangyum reviewed Sep 5, 2022

View reviewed changes

docs/index.md Outdated Show resolved Hide resolved

update

6620984

LuciferYang changed the title ~~[SPARK-40331][DOCS] Recommend use Java 11+ as the runtime environment of Spark~~ [SPARK-40331][DOCS] Recommend use Java 11/17 as the runtime environment of Spark Sep 5, 2022

wangyum approved these changes Sep 5, 2022

View reviewed changes

srowen approved these changes Sep 5, 2022

View reviewed changes

HyukjinKwon reviewed Sep 6, 2022

View reviewed changes

dongjoon-hyun reviewed Sep 6, 2022

View reviewed changes

revert

60ae7ab

srowen closed this Sep 11, 2022

zhengruifeng mentioned this pull request Oct 11, 2022

[SPARK-40516] Add Apache Spark 3.3.0 Dockerfile apache/spark-docker#2

Closed

[SPARK-40331][DOCS] Recommend use Java 11/17 as the runtime environment of Spark #37799

[SPARK-40331][DOCS] Recommend use Java 11/17 as the runtime environment of Spark #37799

Uh oh!

Conversation

LuciferYang commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

LuciferYang commented Sep 5, 2022

Uh oh!

Uh oh!

wangyum commented Sep 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wangyum commented Sep 5, 2022

Uh oh!

HyukjinKwon commented Sep 6, 2022

Uh oh!

HyukjinKwon Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 6, 2022

Choose a reason for hiding this comment

Uh oh!

zhengruifeng commented Sep 6, 2022

Uh oh!

Yikun commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LuciferYang commented Sep 6, 2022

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Sep 6, 2022

Uh oh!

LuciferYang commented Sep 6, 2022

Uh oh!

dongjoon-hyun commented Sep 6, 2022

Uh oh!

LuciferYang commented Sep 6, 2022

Uh oh!

cloud-fan commented Sep 6, 2022

Uh oh!

LuciferYang commented Sep 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Sep 6, 2022

Uh oh!

HyukjinKwon commented Sep 7, 2022

Uh oh!

wangyum commented Sep 21, 2022

Uh oh!

LuciferYang commented Sep 5, 2022 •

edited

Loading

wangyum commented Sep 5, 2022 •

edited

Loading

HyukjinKwon Sep 6, 2022 •

edited

Loading

dongjoon-hyun Sep 6, 2022 •

edited

Loading

Yikun commented Sep 6, 2022 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

LuciferYang commented Sep 6, 2022 •

edited

Loading