Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Sep 20, 2019

What changes were proposed in this pull request?

This is a backport of #24373 , #24404 and #24434

This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.

See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .

The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min

How was this patch tested?

Unit test

…JVMs for higher parallelism

This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.

See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .

The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min

Unit test

Closes #24373 from gengliangwang/parallelTest.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 20, 2019

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism [WIP][SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism Sep 20, 2019
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Sep 20, 2019

I'll add the follow-up PR, too. BTW, @gatorsmile , do we need cd4a284 ?

gengliangwang and others added 2 commits September 19, 2019 17:26
…t suite list

The test time of `HiveClientVersions` is around 3.5 minutes.
This PR is to add it into the parallel test suite list. To make sure there is no colliding warehouse location,  we can change the warehouse path to a temporary directory.

Unit test

Closes #24404 from gengliangwang/parallelTestFollowUp.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
This patch makes several test flakiness fixes.

N/A

Closes #24434 from gatorsmile/fixFlakyTest.

Lead-authored-by: gatorsmile <gatorsmile@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@dongjoon-hyun dongjoon-hyun changed the title [WIP][SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism [SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism Sep 20, 2019
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27460][TESTS] Running slowest test suites in their own forked JVMs for higher parallelism [SPARK-27460][TESTS][2.4] Running slowest test suites in their own forked JVMs for higher parallelism Sep 20, 2019
@HyukjinKwon
Copy link
Member

Looks fine to me.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK pending tests.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @HyukjinKwon and @srowen !

@dongjoon-hyun
Copy link
Member Author

The last Jenkins run already passed all Java/Scala tests and now is running PySpark test.

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111030 has finished for PR 25861 at commit e16067e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111031 has finished for PR 25861 at commit 0896cb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111032 has finished for PR 25861 at commit 8c14698.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging
  • class SparkListenerWithClusterSuite extends SparkFunSuite with LocalSparkContext
  • class StreamingQueryManagerSuite extends StreamTest

@dongjoon-hyun
Copy link
Member Author

Merged to branch-2.4.

dongjoon-hyun pushed a commit that referenced this pull request Sep 20, 2019
…rked JVMs for higher parallelism

## What changes were proposed in this pull request?

This is a backport of #24373 , #24404 and #24434

This patch modifies SparkBuild so that the largest / slowest test suites (or collections of suites) can run in their own forked JVMs, allowing them to be run in parallel with each other. This opt-in / whitelisting approach allows us to increase parallelism without having to fix a long-tail of flakiness / brittleness issues in tests which aren't performance bottlenecks.

See comments in SparkBuild.scala for information on the details, including a summary of why we sometimes opt to run entire groups of tests in a single forked JVM .

The time of full new pull request test in Jenkins is reduced by around 53%:
before changes: 4hr 40min
after changes: 2hr 13min

## How was this patch tested?

Unit test

Closes #25861 from dongjoon-hyun/SPARK-27460.

Lead-authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Co-authored-by: gatorsmile <gatorsmile@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-27460 branch November 23, 2019 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants