[SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to take time instead of int #20032

foxish · 2017-12-20T12:16:32Z

What changes were proposed in this pull request?

Fixing configuration that was taking an int which should take time. Discussion in #19946 (comment)
Made the granularity milliseconds as opposed to seconds since there's a use-case for sub-second reactions to scale-up rapidly especially with dynamic allocation.

How was this patch tested?

TODO: manual run of integration tests against this PR.
PTAL

cc/ @mccheah @liyinan926 @kimoonkim @vanzin @mridulm @jiangxb1987 @ueshin

SparkQA · 2017-12-20T12:32:25Z

Test build #85185 has finished for PR 20032 at commit 48a3326.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-12-20T16:06:35Z

...rc/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala

@@ -217,7 +217,7 @@ private[spark] class KubernetesClusterSchedulerBackend(
        .watch(new ExecutorPodsWatcher()))

    allocatorExecutor.scheduleWithFixedDelay(
-      allocatorRunnable, 0L, podAllocationInterval, TimeUnit.SECONDS)
+      allocatorRunnable, 0L, podAllocationInterval.toLong, TimeUnit.MILLISECONDS)


Just checking this definitely returns ms? Looks good then.

Why not use conf.getTimeAsMs for podAllocationInterval ? It takes care of format specification by users and would be consistent (it is a private var anyway)

Done. That's much better, thanks!

mridulm

Is there any performance consideration due to very rapid allocation requests (due to very low batch delay) ? For example, if set to 1
(And what happens if set to 0/-ve ?)

foxish · 2017-12-20T21:01:35Z

@mridulm

Within the allocator's control loop, it's all asynchronous requests being made for executor pods from the k8s API, so, each iteration doesn't take very long or block. If a user were to set a very low value for the delay, it wouldn't necessarily make more requests because the creation is still rate limited by the time it takes for the executors launched in the previous round to become ready (which is around 1s typically, if not higher). So, in the worst case, we may end up polling the state of our internal data structures quickly adding to the driver's CPU load, but the data structures are updated async by watches, so, it doesn't increase the load on the K8s API server or have to fetch any state over the network. I can add a caveat to the documentation that setting it to a very low number of ms may increase load on the driver.

If the intent with reducing batch internal is for people looking to spin up N executors as quickly as possible, if the resources are present, I'd recommend that they use the batch.size parameter instead and set that equal to num_executors. That would realize the same effect, and schedule all executors in a single run of the allocator's control loop.

SparkQA · 2017-12-20T21:01:59Z

Test build #85208 has finished for PR 20032 at commit 4adb04b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

foxish · 2017-12-20T21:05:03Z

(And what happens if set to 0/-ve ?)

We have a check preventing that in the option itself. The value should be strictly greater than 0 ms.

vanzin · 2017-12-20T22:01:14Z

...rc/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala

@@ -86,7 +86,7 @@ private[spark] class KubernetesClusterSchedulerBackend(

  private val initialExecutors = SchedulerBackendUtils.getInitialTargetExecutorNumber(conf)

-  private val podAllocationInterval = conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY)
+  private val podAllocationInterval = conf.getTimeAsMs(KUBERNETES_ALLOCATION_BATCH_DELAY.key)


This should be just conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY). The config's unit is already ms.

That was the previous state actually. Not sure what the best practice is here. @mridulm, thoughts? getTimeAsMs seemed better as it might protect us from changes in the config if at all that happens. It enforces the contract that it must be milliseconds, which is essential for the allocator to function correctly.

conf.get(KUBERNETES_ALLOCATION_BATCH_DELAY) returns a Long if it's a time conf. That's how time configs are expected to be used.

You don't need podAllocationInterval.toLong later on like you had before.

I (incorrectly) assumed KUBERNETES_ALLOCATION_BATCH_DELAY was a String, and not ConfigEntry.
@vanzin's suggestion is much more elegant in comparison.

foxish · 2017-12-20T22:39:39Z

Fixed, thanks for the reviews. @mridulm, the questions about performance are super helpful for us also to keep track of how we're doing wrt other cluster managers and wrt user expectations. Thanks! :)

SparkQA · 2017-12-20T22:56:38Z

Test build #85214 has finished for PR 20032 at commit 7427204.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-12-21T00:14:06Z

Merging to master.

What changes were proposed in this pull request? This PR contains documentation on the usage of Kubernetes scheduler in Spark 2.3, and a shell script to make it easier to build docker images required to use the integration. The changes detailed here are covered by #19717 and #19468 which have merged already. How was this patch tested? The script has been in use for releases on our fork. Rest is documentation. cc rxin mateiz (shepherd) k8s-big-data SIG members & contributors: foxish ash211 mccheah liyinan926 erikerlandson ssuchter varunkatta kimoonkim tnachen ifilonenko reviewers: vanzin felixcheung jiangxb1987 mridulm TODO: - [x] Add dockerfiles directory to built distribution. (#20007) - [x] Change references to docker to instead say "container" (#19995) - [x] Update configuration table. - [x] Modify spark.kubernetes.allocation.batch.delay to take time instead of int (#20032) Author: foxish <ramanathana@google.com> Closes #19946 from foxish/update-k8s-docs.

Change config to support millisecond based timeconf

48a3326

foxish mentioned this pull request Dec 20, 2017

[SPARK-22648] [K8S] Spark on Kubernetes - Documentation #19946

Closed

4 tasks

srowen reviewed Dec 20, 2017

View reviewed changes

mridulm reviewed Dec 20, 2017

View reviewed changes

Use getTimeAsMs

4adb04b

vanzin reviewed Dec 20, 2017

View reviewed changes

Address comments

7427204

asfgit closed this in 0114c89 Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to take time instead of int #20032

[SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to take time instead of int #20032

foxish commented Dec 20, 2017

SparkQA commented Dec 20, 2017

srowen Dec 20, 2017

mridulm Dec 20, 2017 •

edited

Loading

foxish Dec 20, 2017

mridulm left a comment •

edited

Loading

foxish commented Dec 20, 2017 •

edited

Loading

SparkQA commented Dec 20, 2017

foxish commented Dec 20, 2017

vanzin Dec 20, 2017

foxish Dec 20, 2017

vanzin Dec 20, 2017

foxish Dec 20, 2017

mridulm Dec 22, 2017

foxish commented Dec 20, 2017

SparkQA commented Dec 20, 2017

vanzin commented Dec 21, 2017

[SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to take time instead of int #20032

[SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to take time instead of int #20032

Conversation

foxish commented Dec 20, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Dec 20, 2017

srowen Dec 20, 2017

Choose a reason for hiding this comment

mridulm Dec 20, 2017 • edited Loading

Choose a reason for hiding this comment

foxish Dec 20, 2017

Choose a reason for hiding this comment

mridulm left a comment • edited Loading

Choose a reason for hiding this comment

foxish commented Dec 20, 2017 • edited Loading

SparkQA commented Dec 20, 2017

foxish commented Dec 20, 2017

vanzin Dec 20, 2017

Choose a reason for hiding this comment

foxish Dec 20, 2017

Choose a reason for hiding this comment

vanzin Dec 20, 2017

Choose a reason for hiding this comment

foxish Dec 20, 2017

Choose a reason for hiding this comment

mridulm Dec 22, 2017

Choose a reason for hiding this comment

foxish commented Dec 20, 2017

SparkQA commented Dec 20, 2017

vanzin commented Dec 21, 2017

mridulm Dec 20, 2017 •

edited

Loading

mridulm left a comment •

edited

Loading

foxish commented Dec 20, 2017 •

edited

Loading