[SPARK-29203][SQL][TESTS] Reduce shuffle partitions in SQLQueryTestSuite #25891

wangyum · 2019-09-22T10:56:34Z

What changes were proposed in this pull request?

This PR reduce shuffle partitions from 200 to 4 in SQLQueryTestSuite to reduce testing time.

Why are the changes needed?

Reduce testing time.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested in my local:
Before:

...
[info] - subquery/in-subquery/in-joins.sql (6 minutes, 19 seconds)
[info] - subquery/in-subquery/not-in-joins.sql (2 minutes, 17 seconds)
[info] - subquery/scalar-subquery/scalar-subquery-predicate.sql (45 seconds, 763 milliseconds)
...
Run completed in 1 hour, 22 minutes.

After:

...
[info] - subquery/in-subquery/in-joins.sql (1 minute, 12 seconds)
[info] - subquery/in-subquery/not-in-joins.sql (27 seconds, 541 milliseconds)
[info] - subquery/scalar-subquery/scalar-subquery-predicate.sql (17 seconds, 360 milliseconds)
...
Run completed in 47 minutes.

wangyum · 2019-09-22T11:28:36Z

cc @HyukjinKwon @dongjoon-hyun

sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala

HyukjinKwon · 2019-09-22T12:57:08Z

cc @gatorsmile and @cloud-fan too

SparkQA · 2019-09-22T13:41:30Z

Test build #111153 has finished for PR 25891 at commit 3dc0124.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-22T13:48:35Z

retest this please

HyukjinKwon · 2019-09-22T13:49:58Z

@wangyum if tests fail and are tricky to fix, let's just only fix SQLQueryTestSuite for now since that takes longest time.

SparkQA · 2019-09-22T16:25:21Z

Test build #111158 has finished for PR 25891 at commit 3dc0124.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-22T17:26:36Z

Test build #111159 has finished for PR 25891 at commit 6ec9761.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-09-22T18:22:49Z

sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out

 -- !query 4 output
-1	10	val3b	8	NULL
 1	10	val1b	8	16
+1	10	val3b	8	NULL


Ur, do we really need this?

Yes.

[info] - subquery/in-subquery/in-joins.sql *** FAILED *** (15 seconds, 359 milliseconds) [info] subquery/in-subquery/in-joins.sql [info] Expected "1 10 val[3b 8 NULL [info] 1 10 val1b 8 16] [info] 1 10 val3a 6 12 [info] 1 8...", but got "1 10 val[1b 8 16 [info] 1 10 val3b 8 NULL] [info] 1 10 val3a 6 12 [info] 1 8..." Result did not match for query #4 [info] SELECT Count(DISTINCT(t1a)), [info] t1b, [info] t3a, [info] t3b, [info] t3c [info] FROM t1 natural left JOIN t3 [info] WHERE t1a IN [info] ( [info] SELECT t2a [info] FROM t2 [info] WHERE t1d = t2d) [info] AND t1b > t3b [info] GROUP BY t1a, [info] t1b, [info] t3a, [info] t3b, [info] t3c [info] ORDER BY t1a DESC, t3b DESC (SQLQueryTestSuite.scala:383) [info] org.scalatest.exceptions.TestFailedException:

Got it. It seems that we are hitting the corner case because the query has a sort on a subset of columns.

def isSorted(plan: LogicalPlan): Boolean = plan match { case _: Join | _: Aggregate | _: Generate | _: Sample | _: Distinct => false case _: DescribeCommandBase | _: DescribeColumnCommand | _: DescribeTableStatement | _: DescribeColumnStatement => true case PhysicalOperation(_, _, Sort(_, true, _)) => true case _ => plan.children.iterator.exists(isSorted) }

dongjoon-hyun · 2019-09-22T18:23:32Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala


+  override def sparkConf: SparkConf = super.sparkConf
+    // Reduce shuffle partitions to reduce testing time.
+    .set(SQLConf.SHUFFLE_PARTITIONS, 5)


For Python UDF test, this seems to increase from 4 to 5. Did I understand correctly?

Actually why don't we try 4 @wangyum?

+1 for 4.

OK. I'll try to set it to 4. This is because it is set to 5 in two places:

spark/sql/core/src/test/scala/org/apache/spark/sql/test/TestSQLContext.scala

Lines 58 to 64 in 359375e

/**

* A map used to store all confs that need to be overridden in sql/core unit tests.

*/

val overrideConfs: Map[String, String] =

Map(

// Fewer shuffle partitions to speed up testing.

SQLConf.SHUFFLE_PARTITIONS.key -> "5")

spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala

Lines 613 to 620 in 42b80ae

/**

* A map used to store all confs that need to be overridden in sql/hive unit tests.

*/

val overrideConfs: Map[String, String] =

Map(

// Fewer shuffle partitions to speed up testing.

SQLConf.SHUFFLE_PARTITIONS.key -> "5"

)

=4 has another issue: --SET spark.sql.autoBroadcastJoinThreshold=10485760 will success, but --SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true will failed:

22:31:31.233 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true,spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY [info] - subquery/in-subquery/not-in-joins.sql *** FAILED *** (32 seconds, 609 milliseconds) [info] subquery/in-subquery/not-in-joins.sql [info] Expected "1 16 12 [21 [info] 1 16 12 10] [info] 1 10 NULL 12 [info] 1 6 8 ...", but got "1 16 12 [10 [info] 1 16 12 21] [info] 1 10 NULL 12 [info] 1 6 8 ..." Result did not match for query #6 [info] SELECT Count(DISTINCT( t1a )), [info] t1b, [info] t1c, [info] t1d [info] FROM t1 [info] WHERE t1a NOT IN (SELECT t2a [info] FROM t2 [info] JOIN t1 [info] WHERE t2b <> t1b) [info] GROUP BY t1b, [info] t1c, [info] t1d [info] HAVING t1d NOT IN (SELECT t2d [info] FROM t2 [info] WHERE t1d = t2d) [info] ORDER BY t1b DESC (SQLQueryTestSuite.scala:383)

Can we add an ORDER BY to make the query output deterministic?

dongjoon-hyun · 2019-09-22T18:25:09Z

Thank you for taking care of this. In general, I agree with the idea. However, I left two comments about the result changes and the side-effect on Python UDF tests. It seems that we need to revise this PR more to achieve the original idea correctly.

dongjoon-hyun · 2019-09-22T23:53:48Z

I updated the PR description too (from 5 to 4).

SparkQA · 2019-09-23T02:08:27Z

Test build #111172 has finished for PR 25891 at commit 55004b9.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-09-23T03:27:15Z

LGTM if tests pass

SparkQA · 2019-09-23T07:05:02Z

Test build #111185 has finished for PR 25891 at commit b4f2d19.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-23T07:42:54Z

retest this please

HyukjinKwon

LGTM too if tests pass

wangyum · 2019-09-23T10:41:17Z

It should be pass after this commit:

[info] Run completed in 34 minutes, 35 seconds.
[info] Total number of tests run: 211
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 211, failed 0, canceled 0, ignored 1, pending 0
[info] All tests passed.

SparkQA · 2019-09-23T11:20:19Z

Test build #111202 has finished for PR 25891 at commit b4f2d19.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-09-23T11:27:50Z

retest this please

SparkQA · 2019-09-23T13:06:50Z

Test build #111207 has finished for PR 25891 at commit ad6bee7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-23T13:17:36Z

Test build #111214 has finished for PR 25891 at commit df51b69.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-09-23T15:31:18Z

Test build #111218 has finished for PR 25891 at commit df51b69.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangyum · 2019-09-23T15:39:25Z

Thank you all!

wangyum · 2019-09-23T15:39:34Z

Merged to master.

dongjoon-hyun · 2019-09-23T16:08:54Z

+1, Late LGTM. ORDER BY looks a correct way for the future. Thank you all!

dongjoon-hyun · 2019-09-25T22:49:05Z

BTW, @wangyum . Can we have this in branch-2.4?

maropu · 2019-09-26T08:21:05Z

Reducing the time looks super nice...

…estSuite ### What changes were proposed in this pull request? This PR backport #25891 to `branch-2.4`. ### Why are the changes needed? Reduce testing time. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Manually tested in my local: Before: ``` ... [info] - subquery/in-subquery/in-joins.sql (6 minutes, 19 seconds) [info] - subquery/in-subquery/not-in-joins.sql (2 minutes, 17 seconds) [info] - subquery/scalar-subquery/scalar-subquery-predicate.sql (45 seconds, 763 milliseconds) ... Run completed in 1 hour, 22 minutes. ``` After: ``` ... [info] - subquery/in-subquery/in-joins.sql (1 minute, 12 seconds) [info] - subquery/in-subquery/not-in-joins.sql (27 seconds, 541 milliseconds) [info] - subquery/scalar-subquery/scalar-subquery-predicate.sql (17 seconds, 360 milliseconds) ... Run completed in 47 minutes. Closes #25938 from wangyum/SPARK-29203-branch-2.4. Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang <wgyumg@gmail.com>

Reduce shuffle partitions to reduce testing time

3dc0124

HyukjinKwon reviewed Sep 22, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala Outdated Show resolved Hide resolved

HyukjinKwon reviewed Sep 22, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala Outdated Show resolved Hide resolved

Avoid test error in StreamSuite

6ec9761

dongjoon-hyun reviewed Sep 22, 2019

View reviewed changes

dongjoon-hyun added SQL TESTS labels Sep 22, 2019

SHUFFLE_PARTITIONS to 4

55004b9

SHUFFLE_PARTITIONS to 5

b4f2d19

HyukjinKwon approved these changes Sep 23, 2019

View reviewed changes

wangyum added 2 commits September 23, 2019 16:22

SHUFFLE_PARTITIONS to 4 and add sort

ad6bee7

DESC -> ASC

df51b69

wangyum closed this in 0c40b94 Sep 23, 2019

wangyum mentioned this pull request Sep 26, 2019

[SPARK-29203][SQL][TESTS][2.4] Reduce shuffle partitions in SQLQueryTestSuite #25938

Closed

wangyum deleted the SPARK-29203 branch September 26, 2019 08:08

	/**
	* A map used to store all confs that need to be overridden in sql/core unit tests.
	*/
	val overrideConfs: Map[String, String] =
	Map(
	// Fewer shuffle partitions to speed up testing.
	SQLConf.SHUFFLE_PARTITIONS.key -> "5")

[SPARK-29203][SQL][TESTS] Reduce shuffle partitions in SQLQueryTestSuite #25891

[SPARK-29203][SQL][TESTS] Reduce shuffle partitions in SQLQueryTestSuite #25891

Uh oh!

Conversation

wangyum commented Sep 22, 2019 • edited by dongjoon-hyun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wangyum commented Sep 22, 2019

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon commented Sep 22, 2019

Uh oh!

SparkQA commented Sep 22, 2019

Uh oh!

HyukjinKwon commented Sep 22, 2019

Uh oh!

HyukjinKwon commented Sep 22, 2019

Uh oh!

SparkQA commented Sep 22, 2019

Uh oh!

SparkQA commented Sep 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Sep 22, 2019

Uh oh!

dongjoon-hyun commented Sep 22, 2019

Uh oh!

SparkQA commented Sep 23, 2019

Uh oh!

cloud-fan commented Sep 23, 2019

Uh oh!

SparkQA commented Sep 23, 2019

Uh oh!

HyukjinKwon commented Sep 23, 2019

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

wangyum commented Sep 23, 2019

Uh oh!

SparkQA commented Sep 23, 2019

Uh oh!

HyukjinKwon commented Sep 23, 2019

Uh oh!

SparkQA commented Sep 23, 2019

Uh oh!

SparkQA commented Sep 23, 2019

Uh oh!

SparkQA commented Sep 23, 2019

Uh oh!

wangyum commented Sep 23, 2019

Uh oh!

wangyum commented Sep 23, 2019

Uh oh!

dongjoon-hyun commented Sep 23, 2019

Uh oh!

dongjoon-hyun commented Sep 25, 2019

Uh oh!

wangyum commented Sep 22, 2019 •

edited by dongjoon-hyun

Loading