[DCOS-51636] Use spark.executor.memoryOverhead instead of spark.mesos.executor.memoryOverhead #64

farhan5900 · 2019-08-29T21:05:22Z

What changes were proposed in this pull request?

Changes the name of the configuration from spark.mesos.executor.memoryOverhead to spark.executor.memoryOverhead to make it consistent with spark.driver.memoryOverhead.

How was this patch tested?

It is tested by running unit test.

…verhead

samvantran

LGTM - let's make sure tests pass

…park into DCOS-51636-rename-executor-memoverhead-config

akirillov

Thanks, @farhan5900, LGTM.

A few questions:

where is this change coming from? It's a good improvement but do we have any associated request or upstream Jira ticket?
do we need to update tests in mesosphere/spark-build? I believe we don't set memoryOverhead explicitly but it's worth checking

farhan5900 · 2019-09-03T00:44:36Z

There is no upstream ticket or request for this change. It was more of internal improvement suggested during implementation of spark.driver.memoryOverhead DCOS-34235.

I have checked all the tests in mesosphere/spark-build and there is no mention of spark.mesos.executor.memoryOverhead directly.

akirillov · 2019-09-04T03:20:12Z

Thanks, @farhan5900. The main issue with this PR is that it leads to a divergence with Apache Spark: users used to spark.mesos.executor.memoryOverhead will be unable to use it with Mesosphere fork while it will still be available in Apache Spark.

I'd suggest doing the following (given that there's no semantic difference between spark.mesos.executor.memoryOverhead and spark.executor.memoryOverhead):

creating a Jira ticket in ASF Spark Jira
creating similar PR in the upstream and look at the feedback

Please take into account the cadence of Apache releases - the difference in this property will become available to Apache Spark users only in the next releases of 2.4 and 3.0

akirillov · 2019-09-05T16:11:50Z

As there exists similar property for YARN: spark.yarn.executor.memoryOverhead it makes sense to be on par with it and avoid long-term communication with ASF Spark community for the purpose of a single property rename.

…in optimizations  ### What changes were proposed in this pull request?  This is a followup of apache#26434 This PR use one special shuffle reader for skew join, so that we only have one join after optimization. In order to do that, this PR 1. add a very general `CustomShuffledRowRDD` which support all kind of partition arrangement. 2. move the logic of coalescing shuffle partitions to a util function, and call it during skew join optimization, to totally decouple with the `ReduceNumShufflePartitions` rule. It's too complicated to interfere skew join with `ReduceNumShufflePartitions`, as you need to consider the size of split partitions which don't respect target size already. ### Why are the changes needed?  The current skew join optimization has a serious performance issue: the size of the query plan depends on the number and size of skewed partitions. ### Does this PR introduce any user-facing change?  no ### How was this patch tested?  existing tests test UI manually: ![image](https://user-images.githubusercontent.com/3182036/74357390-cfb30480-4dfa-11ea-83f6-825d1b9379ca.png) explain output ``` AdaptiveSparkPlan(isFinalPlan=true) +- OverwriteByExpression org.apache.spark.sql.execution.datasources.noop.NoopTable$403a2ed5, [AlwaysTrue()], org.apache.spark.sql.util.CaseInsensitiveStringMap1f +- *(5) SortMergeJoin(skew=true) [key1#2L], [key2#6L], Inner :- *(3) Sort [key1#2L ASC NULLS FIRST], false, 0 : +- SkewJoinShuffleReader 2 skewed partitions with size(max=5 KB, min=5 KB, avg=5 KB) : +- ShuffleQueryStage 0 : +- Exchange hashpartitioning(key1#2L, 200), true, [id=#53] : +- *(1) Project [(id#0L % 2) AS key1#2L] : +- *(1) Filter isnotnull((id#0L % 2)) : +- *(1) Range (0, 100000, step=1, splits=6) +- *(4) Sort [key2#6L ASC NULLS FIRST], false, 0 +- SkewJoinShuffleReader 2 skewed partitions with size(max=5 KB, min=5 KB, avg=5 KB) +- ShuffleQueryStage 1 +- Exchange hashpartitioning(key2#6L, 200), true, [id=#64] +- *(2) Project [((id#4L % 2) + 1) AS key2#6L] +- *(2) Filter isnotnull(((id#4L % 2) + 1)) +- *(2) Range (0, 100000, step=1, splits=6) ``` Closes apache#27493 from cloud-fan/aqe. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: herman <herman@databricks.com> (cherry picked from commit a4ceea6) Signed-off-by: herman <herman@databricks.com>

Changes spark.mesos.executor.memoryOverhead to spark.executor.memoryO…

a7af1cc

…verhead

farhan5900 requested review from akirillov, alembiewski, rpalaznik and samvantran August 29, 2019 21:05

samvantran approved these changes Aug 29, 2019

View reviewed changes

Merge branch 'custom-branch-2.4.3' of https://github.com/mesosphere/s…

33b4d43

…park into DCOS-51636-rename-executor-memoverhead-config

akirillov approved these changes Aug 31, 2019

View reviewed changes

akirillov closed this Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DCOS-51636] Use spark.executor.memoryOverhead instead of spark.mesos.executor.memoryOverhead #64

[DCOS-51636] Use spark.executor.memoryOverhead instead of spark.mesos.executor.memoryOverhead #64

Uh oh!

farhan5900 commented Aug 29, 2019

Uh oh!

samvantran left a comment

Uh oh!

akirillov left a comment

Uh oh!

farhan5900 commented Sep 3, 2019 •

edited

Loading

Uh oh!

akirillov commented Sep 4, 2019

Uh oh!

akirillov commented Sep 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[DCOS-51636] Use spark.executor.memoryOverhead instead of spark.mesos.executor.memoryOverhead #64

[DCOS-51636] Use spark.executor.memoryOverhead instead of spark.mesos.executor.memoryOverhead #64

Uh oh!

Conversation

farhan5900 commented Aug 29, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

samvantran left a comment

Choose a reason for hiding this comment

Uh oh!

akirillov left a comment

Choose a reason for hiding this comment

Uh oh!

farhan5900 commented Sep 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akirillov commented Sep 4, 2019

Uh oh!

akirillov commented Sep 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

farhan5900 commented Sep 3, 2019 •

edited

Loading