[SPARK-21115][Core]If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule #18322

eatoncys · 2017-06-16T06:00:52Z

What changes were proposed in this pull request?

If we start an app with the param --total-executor-cores=4 and spark.executor.cores=3, the cores left is always 1, so it will try to allocate executors in the function org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule.
Another question is, is it will be better to allocate another executor with 1 core for the cores left.

How was this patch tested?

unit test

…ll not be allocated, so it should not to check in every schedule

AmplabJenkins · 2017-06-16T06:02:15Z

Can one of the admins verify this patch?

jerryshao · 2017-06-16T08:48:37Z

According to my test, current code of Master could handle this situation correctly, did you see any issue here without your fix?

eatoncys · 2017-06-16T09:00:35Z

@jerryshao I have not see any issue here, and I have tested this again using the latest Master code, the problem also exists.

jerryshao · 2017-06-16T09:02:32Z

If you don't see any issue here, what's problem you met? From my understanding, what you did here is only changing the code slightly to avoid unnecessary check, is that right?

eatoncys · 2017-06-16T09:15:50Z

@jerryshao The problem is: If we start an app with the param --total-executor-cores=4 and spark.executor.cores=3, the code "app.coresLeft>0" is always true in "org.apache.spark.deploy.master.startExecutorsOnWorkers" and it will try to allocate executor for this app and it will allocate nothing, it is better to compare the app.coresLeft whih coresPerExecutor, if the coresLeft less than coresPerExecutor, it will return directly.

eatoncys · 2017-06-16T09:23:14Z

@jerryshao I have modified the "app.coresLeft>0" to "app.coresLeft >= coresPerExecutor.getOrElse(1)".
And another question is : is it will be better to allocate another executor with 1 core for the cores left?

jerryshao · 2017-06-16T09:30:55Z

I see, I understand your changes now.

IMO, because user specifically request for 3 cores per executor (as an example), it is not so good to allocate 1 executor with only 1 core, this may break user's purpose. Also the original behavior of Master is to not allocate executor with residual resources, we should keep the convention.

Besides, I think it would be better to add some warning logs in SparkSubmit when total cores is not divisible by cores per executor.

eatoncys · 2017-06-16T09:38:13Z

@jerryshao Ok, I will add warning logs in SparkSubmit, thanks.

eatoncys · 2017-06-16T10:14:58Z

@jerryshao I have added warning logs in SparkSubmit , would you like to review it again, thanks.

srowen · 2017-06-16T10:19:37Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

      SparkSubmit.printErrorAndExit("--py-files given but primary resource is not a Python script")
    }
+    if (totalExecutorCores != null && executorCores != null) {
+      val totalCores = Try(totalExecutorCores.toInt).getOrElse(-1)


What is the purpose of the Try block here? if it fails the input is invalid, and proceeding with -1 can't be right.

Ok, I will remove the Try block, thanks.

srowen · 2017-06-16T10:20:06Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

    // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
-    for (app <- waitingApps if app.coresLeft > 0) {
+    for (app <- waitingApps) {


Shouldn't all of this change be reverted?

Is not it be better to compare app.coresLeft whih coresPerExecutor? If the coresLeft less than coresPerExecutor, it will return directly

@srowen If the total cores is not divisible by cores per executor, the compare app.coresLeft>0 will be always true, so it is better to compare app.coresLeft with coresPerExecutor than compare with 0.

I see, Jerry was not saying that this part results in a logic change. This is OK.

srowen · 2017-06-17T06:16:52Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

      SparkSubmit.printErrorAndExit("--py-files given but primary resource is not a Python script")
    }
+    if (totalExecutorCores != null && executorCores != null
+        && (totalExecutorCores.toInt % executorCores.toInt) != 0) {


Minor, but I think the parentheses are in the wrong place. They should go around the expression, or probably just be removed. You might avoid duplicating the mod expression for clarity.

Ok, I've modified the mod expression repeated to a val param and reused it,thanks.

srowen · 2017-06-17T06:19:08Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

    // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app
    // in the queue, then the second app, etc.
-    for (app <- waitingApps if app.coresLeft > 0) {
+    for (app <- waitingApps) {


I see, Jerry was not saying that this part results in a logic change. This is OK.

srowen · 2017-06-17T06:19:51Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

-        allocateWorkerResourceToExecutors(
-          app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
+      // If the cores left is less than the coresPerExecutor,the cores left will not be allocated
+      if (app.coresLeft >= coresPerExecutor.getOrElse(1)) {


Might avoid duplicating coresPerExecutor.getOrElse(1) by referring to it as requestedCoresPerExecutor or something

Ok, I've modified the expression "val coresPerExecutor = app.desc.coresPerExecutor" to "val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)" and reused it. And then using "app.desc.coresPerExecutor" directly in the function allocateWorkerResourceToExecutors. Thanks.

jerryshao · 2017-06-19T02:08:05Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

    if (pyFiles != null && !isPython) {
      SparkSubmit.printErrorAndExit("--py-files given but primary resource is not a Python script")
    }
+    if (totalExecutorCores != null && executorCores != null) {


Think about this again, I think this part of code logic could be moved to SparkConf, in case user sets these configuration via SparkConf object in a programmatic way.

Ok, I have moved it to SparkConf, would you like to review it again, thanks.

jerryshao · 2017-06-19T02:20:22Z

@eatoncys can you please add a unit test in MasterSuite to verify your new code if possible?

eatoncys · 2017-06-19T07:44:21Z

@jerryshao, I have added a unit test in MasterSuite, would you like to review it again, thanks.

srowen · 2017-06-19T09:57:47Z

core/src/main/scala/org/apache/spark/SparkConf.scala

      }
    }

+    if (contains("spark.cores.max")) {


I think these checks for negative numbers are redundant with arg checking for spark-submit?

@srowen Users may set these configuration via SparkConf object in a programmatic way after arg checking for spark-submit, can I move the checkings from spark–submit to here, or removed it from here directly,which is better?

jerryshao · 2017-06-20T09:09:14Z

core/src/main/scala/org/apache/spark/SparkConf.scala

    }

+    if (contains("spark.cores.max")) {
+      val totalCores = getInt("spark.cores.max", -1)


Please use Option instead, I think we don't need to check them if they're not set. Also looks like if we don't set this configuration, will it always lead to exception as default value is -1.

@jerryshao I don't understand very cleanly, if we don't set this configuration ,the "if (contains("spark.cores.max")) " will not got into.

Sorry my fault, I misunderstood the code.

jerryshao · 2017-06-20T09:15:23Z

core/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala

    master.invokePrivate(_state())
  }
+
+  test("Total cores is not divisible by cores per executor") {


IIUC I don't think this UT really reflects the code you changed in Master, even with the original code this two UTs should also be passed. I think you should test if the logics you changed is executed or not.

@jerryshao The result is same before and after my change, how to test them differently, any suggestion? thanks.

My original thinking is that if it is possible to test wether the if branch is executed or not using mock and verify. But looks like it is not easy to test. Can you please investigate.

@jerryshao I have not any good way to test like this, any good suggestion? @srowen

I see, if there's no better way to verify it, I think it is not useful to add this two UTs.

@jerryshao Ok, I have removed them out, thanks

jerryshao · 2017-06-20T09:41:27Z

core/src/main/scala/org/apache/spark/SparkConf.scala

+          s"(was ${get("spark.executor.cores")}) can only be a positive number")
+      }
+    }
+    if (contains("spark.cores.max") && contains("spark.executor.cores")) {


I think we could move the negative check to here to simplify the code.

@jerryshao I put the negative check here first, but I think the app should exit directly if the cores is negative, so I move them out. And @srowen thinks these checks for negative numbers are redundant with arg checking for spark-submit, it may be a good way to move the checkings from spark-submit to here.

jerryshao · 2017-06-22T03:38:12Z

core/src/main/scala/org/apache/spark/SparkConf.scala

+      if (Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) {
+        throw new IllegalArgumentException(s"spark.executor.memory " +
+          s"(was ${executorMemory}) can only be a positive number")
+      }


This above two checks seems unnecessary, let's not change unrelated code.

@jerryshao Ok, I have removed them, and moved the checkings back to spark-submit. thanks.

jerryshao · 2017-06-22T03:43:19Z

core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

-    }
-    if (numExecutors != null && Try(numExecutors.toInt).getOrElse(-1) <= 0) {
-      SparkSubmit.printErrorAndExit("Number of executors must be a positive number")
-    }


The above changes are valid and useful, I'd suggest to not change it.

@jerryshao Ok ,I have moved them back, thanks

jerryshao · 2017-06-22T05:58:10Z

@jiangxb1987 can you please help to review this PR? This is a simple code improvement to avoid some unnecessary code execution when left cores is not enough for one executor.

I don't strong inclination on this PR since previous code also does correct behavior, I'd like to hear your thoughts.

eatoncys · 2017-06-22T06:37:00Z

cc @srowen

srowen · 2017-06-22T09:11:09Z

core/src/main/scala/org/apache/spark/SparkConf.scala

+      }
+    }
+    if (contains("spark.cores.max") && contains("spark.executor.cores")) {
+      val totalCores = getInt("spark.cores.max", 1)


@jerryshao I think most of the new argument checking code is redundant. I think the 7 lines from here are all that are needed

Let's just keep this checking branch, or move the previous negative checking code here.

@srowen @jiangxb1987 Ok, I have removed the argument checking code, thanks.

jiangxb1987

The changes looks good overall, only a few comments.

jiangxb1987 · 2017-06-22T09:27:10Z

core/src/main/scala/org/apache/spark/SparkConf.scala

+    if (contains("spark.cores.max")) {
+      val totalCores = getInt("spark.cores.max", -1)
+      if (totalCores <= 0) {
+        throw new IllegalArgumentException(s"spark.cores.max (was ${get("spark.cores.max")})" +


${get("spark.cores.max")} => $totalCores ?

jiangxb1987 · 2017-06-22T09:27:44Z

core/src/main/scala/org/apache/spark/SparkConf.scala

+      val executorCores = getInt("spark.executor.cores", -1)
+      if (executorCores <= 0) {
+        throw new IllegalArgumentException(s"spark.executor.cores " +
+          s"(was ${get("spark.executor.cores")}) can only be a positive number")


jiangxb1987 · 2017-06-22T09:34:04Z

core/src/main/scala/org/apache/spark/SparkConf.scala

+      }
+    }
+    if (contains("spark.cores.max") && contains("spark.executor.cores")) {
+      val totalCores = getInt("spark.cores.max", 1)


Let's just keep this checking branch, or move the previous negative checking code here.

jerryshao · 2017-06-23T08:34:25Z

LGTM!

jiangxb1987

LGTM, cc @cloud-fan

cloud-fan · 2017-06-23T12:54:02Z

LGTM, merging to master!

…or,the cores left will not be allocated, so it should not to check in every schedule ## What changes were proposed in this pull request? If we start an app with the param --total-executor-cores=4 and spark.executor.cores=3, the cores left is always 1, so it will try to allocate executors in the function org.apache.spark.deploy.master.startExecutorsOnWorkers in every schedule. Another question is, is it will be better to allocate another executor with 1 core for the cores left. ## How was this patch tested? unit test Author: 10129659 <chen.yanshan@zte.com.cn> Closes apache#18322 from eatoncys/leftcores.

If the cores left is less than the coresPerExecutor,the cores left wi…

5f5f01f

…ll not be allocated, so it should not to check in every schedule

eatoncys added 2 commits June 16, 2017 14:07

delete space

0a87b27

delete space

ad01fb6

Add warning log if total cores is not divisible by cores per executor

8ea87ce

srowen requested changes Jun 16, 2017

View reviewed changes

Remove the Try block

3b15f7b

srowen reviewed Jun 17, 2017

View reviewed changes

modify the repetitive code

0183f7b

jerryshao reviewed Jun 19, 2017

View reviewed changes

Move the check to SparkConf and add unit test

46f54cc

srowen requested changes Jun 19, 2017

View reviewed changes

jerryshao reviewed Jun 20, 2017

View reviewed changes

eatoncys added 2 commits June 22, 2017 10:12

Move the checks for negative numbers from spark-submit to sparkconf

3b3bdd1

Remove the unusfull tests

2be8774

jerryshao reviewed Jun 22, 2017

View reviewed changes

Remove checks back

4cb3fde

srowen reviewed Jun 22, 2017

View reviewed changes

jiangxb1987 reviewed Jun 22, 2017

View reviewed changes

Remove negative checkings

7c27288

srowen approved these changes Jun 23, 2017

View reviewed changes

jiangxb1987 approved these changes Jun 23, 2017

View reviewed changes

asfgit closed this in acd208e Jun 23, 2017

[SPARK-21115][Core]If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule #18322

[SPARK-21115][Core]If the cores left is less than the coresPerExecutor,the cores left will not be allocated, so it should not to check in every schedule #18322

Uh oh!

Conversation

eatoncys commented Jun 16, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Jun 16, 2017

Uh oh!

jerryshao commented Jun 16, 2017

Uh oh!

eatoncys commented Jun 16, 2017

Uh oh!

jerryshao commented Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eatoncys commented Jun 16, 2017

Uh oh!

eatoncys commented Jun 16, 2017

Uh oh!

jerryshao commented Jun 16, 2017

Uh oh!

eatoncys commented Jun 16, 2017

Uh oh!

eatoncys commented Jun 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eatoncys Jun 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryshao commented Jun 19, 2017

Uh oh!

eatoncys commented Jun 19, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eatoncys Jun 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

jerryshao commented Jun 16, 2017 •

edited

Loading

eatoncys Jun 16, 2017 •

edited

Loading

eatoncys Jun 19, 2017 •

edited

Loading

eatoncys Jun 22, 2017 •

edited

Loading