-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SNAP-2231] Limit maximum cores for a job to physical cores on a node #972
base: master
Are you sure you want to change the base?
Conversation
- overrides in SnappyTaskSchedulerImpl to track per executor cores used by a job and cap it to number of physical cores on a node - combined some maps in TaskSchedulerImpl to recover performance due to above and improve further compared to base TaskSchedulerImpl - property "spark.scheduler.limitJobCores=false" can be set to revert to previous behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments and clarifications sought.
} | ||
sc.taskScheduler.asInstanceOf[SnappyTaskSchedulerImpl].addBlockId(executorId, blockId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SnappyTaskSchedulerImpl.addBlockId() method has a condition blockId.numProcessors < blockId.executorCores. From here it will never be satisfied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "case None" is for a corner one where blockManager gets added before executor. For normal cases onExecutorAdded will be invoked first where number of physical cores have been properly initialized so addBlockId will work fine. Will add the handling for that case in onExecutorAdded and invoke addBlockId from the Some() match case there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will also add removal in onExecutorRemoved.
private val lookupExecutorCores = new ToLongFunction[String] { | ||
override def applyAsLong(executorId: String): Long = { | ||
maxExecutorTaskCores.get(executorId) match { | ||
case null => Int.MaxValue // no restriction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not defaultParallelism be better than Int.maxVal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Null means that cores defined for executor are less than or equal to physical cores on the machine, or limit job has been explicitly disabled. Both cases imply the same thing that is don't put any limits on tasks on a node so this essentially falls back to Spark's TaskSchedulerImpl behaviour.
val manager = createTaskSetManager(taskSet, maxTaskFailures) | ||
val stage = taskSet.stageId | ||
val (stageAvailableCores, stageTaskSets) = stageCoresAndAttempts.computeIfAbsent( | ||
stage, createNewStageMap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not be setting the manager for the stageSet. I can see
stageTaskSets(taskSet.stageAttemptId) = manager in original TaskSchedulerImpl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes done below in line number 112.
taskIdExecutorAndManager.justPut(tid, execId -> taskSet) | ||
executorIdToRunningTaskIds(execId).add(tid) | ||
if (availableCores ne null) { | ||
availableCores.addValue(execId, -CPUS_PER_TASK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put an assertion similar to assert(availableCpus(i) >= 0) ?
We might catch some of the erroneous updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
8b43301
to
2b254d9
Compare
2c254f0
to
0f2888f
Compare
a466d26
to
ea127bd
Compare
99ec79c
to
c7b84fa
Compare
See some details in the JIRA https://jira.snappydata.io/browse/SNAP-2231
These changes limit the maximum cores given to a job to the physical cores on a machine.
With the default of (2 * physical cores) in the cluster, this allows other cores to be free
for any other concurrent jobs. Especially important for short point-lookup queries.
Additionally these improve performance for disk intensive queries. For example measured
a 30-50% improvement in performance in TPCH load and some queries when cores were
limited to physical cores and lot of data has overflowed to disk.
Question: should the default cores in ExecutorInitiator be increased to (4 * physical cores)
to allow for more concurrency?
Changes proposed in this pull request
and cap it to number of physical cores on a node
and improve further compared to base TaskSchedulerImpl
Patch testing
precheckin -Pstore -Pspark
TODO: working on porting Spark's TaskScheduler unit tests
ReleaseNotes.txt changes
document the new property and behaviour
Other PRs
TIBCOSoftware/snappy-spark#96