Skip to content

Conversation

@tooptoop4
Copy link
Contributor

@tooptoop4 tooptoop4 commented Feb 25, 2020

What changes were proposed in this pull request?

Don't allow drivers to use all the cores in spark standalone scheduler. Reserve some cores for apps

Why are the changes needed?

Airflow is triggering many Spark jobs in parallel, spark standalone gets 'full' with drivers and can never execute apps as drivers took all the cores. The spark cluster essentially becomes stuck and never completes any drivers (as no cores avail to run the associated apps), never frees up cores. Manual intervention is needed to either kill some drivers (but airflow can retry sending them), or force scale up of the spark cluster by registering more workers (than ASG MaxSize).

Does this PR introduce any user-facing change?

By default no, they must opt-in by explicitly setting config value > 0

How was this patch tested?

Running in prod for 3 months

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@Ngone51 Ngone51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about limiting the number of concurrent running drivers? Personally, I feel it's more convenient for user to set # of driver rather than the cores. WDYT?

@tooptoop4
Copy link
Contributor Author

@Ngone51 but each driver can have different amount of cores

@Ngone51
Copy link
Member

Ngone51 commented Feb 26, 2020

@Ngone51 but each driver can have different amount of cores

Yeah, but for such case, many drivers race for the reserved core concurrently, how can we know which drivers win the race and decide the amount of cores for them?

@tooptoop4
Copy link
Contributor Author

@Ngone51 as long as some cores are not used by any driver, then i don't mind which drivers got in first. I just want some cores that can only be consumed by apps

@Ngone51
Copy link
Member

Ngone51 commented Feb 26, 2020

I just want some cores that can only be consumed by apps

When we limit the number of running drivers, we'll definitely have reserved cores for the apps.

@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Feb 26, 2020

@Ngone51 that is worse, one day you could have 10 drivers each taking 1 core, another day you could have 5 drivers each taking 4 cores (in this case if u limited on 10 drivers on both days, all cores would be used on 2nd day). So it does not work well with diff sized drivers. Cores is simple/uniform way. Secondly, if I want to have scheduled scaleup of spark cluster, specifying a portion of cores to not use still works, whereas fixing a max # drivers does not work as it limits ability to take advantage of larger spark cluster

@jiangxb1987
Copy link
Contributor

What's blocking you from choosing the client deploy mode? Using the client deploy mode would let you launch the driver program locally and thus no need to worry about drivers involved into cluster resource contentions.

@jiangxb1987
Copy link
Contributor

Speaking of the proposed improvement, it does somehow migrates the issue by allowing you to launch at least a few executors, but it would probably still be the case that not enough cluster resources are going to executors, thus many drivers still need to wait for free slots to launch their pending tasks. Without other considerations, I think you should give client deploy mode a try (or you already have other issues thus not able to choose the client mode) ?

@tooptoop4
Copy link
Contributor Author

@jiangxb1987 client deploy mode is subject to limits of single master machine. if I want to run 200+ spark-submits in parallel then the master machine must have a enough memory to support all those drivers. With my change few executors is all I need, but even I can reserve 100s of cores for just apps with this new config

@jiangxb1987
Copy link
Contributor

jiangxb1987 commented Feb 26, 2020

Sorry I don't have enough context to understand your use case, but submitting 200+ applications at the same time to a Spark cluster is something I'm not expecting. Basically I would expect a lot less applications, and each application can submit a few jobs, thus we don't really need to launch that much drivers.

@tooptoop4
Copy link
Contributor Author

tooptoop4 commented Feb 27, 2020

i am doing a data lake, where 10000s of files of different schemas get ingested daily, client mode does not handle the concurrency/scale i need. Also I use Spark REST API (spark.master.rest.port) which does not support client mode. what is the reluctance to merging this?

@tooptoop4
Copy link
Contributor Author

can u pls merge @dongjoon-hyun ?

.createWithDefault(Int.MaxValue)

val CORES_RESERVED_FOR_APPS = ConfigBuilder("spark.deploy.coresReservedForApps")
.version("2.4.6")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @tooptoop4 . We don't backport a feature.
The feasible next version is 3.1.0.

@jiangxb1987
Copy link
Contributor

The proposed improvement is just not necessary. I don't see any need to submit 200+ applications, you could start one application and submit multiple jobs.

launched = true
val allFreeCores = shuffledAliveWorkers.map(_.coresFree).sum
val forDriversFreeCores = math.max(allFreeCores - coresReservedForApps, 0)
if (forDriversFreeCores > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a test case for this?

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-27750] Standalone scheduler - ability to prioritize applications over drivers, many drivers act like Denial of Service [SPARK-27750][CORE] Standalone scheduler - ability to prioritize applications over drivers, many drivers act like Denial of Service Feb 28, 2020
@tooptoop4
Copy link
Contributor Author

@jiangxb1987 ur suggestion is don't scale?

@jiangxb1987
Copy link
Contributor

What do you mean by scale? You don't need 200 drivers, you can still launch one application and submit your jobs, this way your workload should work.

@tooptoop4
Copy link
Contributor Author

  1. one app is going to be slower 2. what if 7 out of 200 job fails, whole app fails meaning poor resumability. 3. files arrive at different times, so can't wait for all to arrive and then do single app. This PR fixes clear bug, spark standalone gets itself stuck. if you don't like the number 200, think of submitting 6 apps in cluster mode on a 4 core cluster, spark gets full of drivers and no apps can ever run, spark stays stuck forever. With this fix, I can guarantee some cores for apps so spark never gets stuck with only drivers

@jiangxb1987
Copy link
Contributor

I‘m -1 to this change, because it's trying to resolve an issue that doesn't even exist.
Please read https://spark.apache.org/docs/latest/cluster-overview.html before you ask. Thanks!

@dongjoon-hyun
Copy link
Member

Thank you for your proposal, @tooptoop4. However, according to the above discussion, I'll close this PR. Thank you, @tooptoop4 , @Ngone51 , @jiangxb1987 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants