-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost #38876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -117,8 +117,10 @@ class BlockManagerMasterEndpoint( | |
| RpcUtils.makeDriverRef(CoarseGrainedSchedulerBackend.ENDPOINT_NAME, conf, rpcEnv) | ||
|
|
||
| override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = { | ||
| case RegisterBlockManager(id, localDirs, maxOnHeapMemSize, maxOffHeapMemSize, endpoint) => | ||
| context.reply(register(id, localDirs, maxOnHeapMemSize, maxOffHeapMemSize, endpoint)) | ||
| case RegisterBlockManager( | ||
| id, localDirs, maxOnHeapMemSize, maxOffHeapMemSize, endpoint, isReRegister) => | ||
| context.reply( | ||
| register(id, localDirs, maxOnHeapMemSize, maxOffHeapMemSize, endpoint, isReRegister)) | ||
|
|
||
| case _updateBlockInfo @ | ||
| UpdateBlockInfo(blockManagerId, blockId, storageLevel, deserializedSize, size) => | ||
|
|
@@ -572,7 +574,8 @@ class BlockManagerMasterEndpoint( | |
| localDirs: Array[String], | ||
| maxOnHeapMemSize: Long, | ||
| maxOffHeapMemSize: Long, | ||
| storageEndpoint: RpcEndpointRef): BlockManagerId = { | ||
| storageEndpoint: RpcEndpointRef, | ||
| isReRegister: Boolean): BlockManagerId = { | ||
| // the dummy id is not expected to contain the topology information. | ||
| // we get that info here and respond back with a more fleshed out block manager id | ||
| val id = BlockManagerId( | ||
|
|
@@ -583,7 +586,12 @@ class BlockManagerMasterEndpoint( | |
|
|
||
| val time = System.currentTimeMillis() | ||
| executorIdToLocalDirs.put(id.executorId, localDirs) | ||
| if (!blockManagerInfo.contains(id)) { | ||
| // SPARK-41360: For the block manager re-registration, we should only allow it when | ||
| // the executor is recognized as active by the scheduler backend. Otherwise, this kind | ||
| // of re-registration from the terminating/stopped executor is meaningless and harmful. | ||
| lazy val isExecutorAlive = | ||
| driverEndpoint.askSync[Boolean](CoarseGrainedClusterMessages.IsExecutorAlive(id.executorId)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to be careful with this change. |
||
| if (!blockManagerInfo.contains(id) && (!isReRegister || isExecutorAlive)) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this is an issue only for terminating executors, we can detect that in executor side and propagate it in the registration request right ? Or are there other cases as well ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What do you mean? The terminating executor can detect itself as being terminating? Actually, not only the terminating executors, executors lost due to long GC or executors who failed to be killed by the driver (where the executor could be an orphan rather than terminated ) are also applied here as long as the case is considered executor lost by the driver.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For the other cases, lost due to long GC, lost due to network partitions, etc - they are legitimate candidates for reregisteration.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ok, i see.
Note that there's a prerequisite of the re-registration in this PR that the executor should already be lost in the driver's view. In that case, block manager re-registration is meaningless since the executor won't reconnect to the driver.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wont the driver not remove in case of a heartbeat expiry even though the executor did not disconnect ? (for the same reasons as above - long gc pause, network partition, etc) (Sorry for the delay, I will get back to this PR later this week - not in good health)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It will. But in these cases, the driver could fail to kill the executor. (@mridulm No worries, take care:)) |
||
| blockManagerIdByExecutor.get(id.executorId) match { | ||
| case Some(oldId) => | ||
| // A block manager of the same executor already exists, so remove it (assumed dead) | ||
|
|
@@ -616,10 +624,29 @@ class BlockManagerMasterEndpoint( | |
| if (pushBasedShuffleEnabled) { | ||
| addMergerLocation(id) | ||
| } | ||
| listenerBus.post(SparkListenerBlockManagerAdded(time, id, | ||
| maxOnHeapMemSize + maxOffHeapMemSize, Some(maxOnHeapMemSize), Some(maxOffHeapMemSize))) | ||
| } | ||
| listenerBus.post(SparkListenerBlockManagerAdded(time, id, maxOnHeapMemSize + maxOffHeapMemSize, | ||
| Some(maxOnHeapMemSize), Some(maxOffHeapMemSize))) | ||
| id | ||
| val updatedId = if (isReRegister && !isExecutorAlive) { | ||
| assert(!blockManagerInfo.contains(id), | ||
| "BlockManager re-registration shouldn't succeed when the executor is lost") | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this assertion need to always hold ?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It does. |
||
|
|
||
| logInfo(s"BlockManager ($id) re-registration is rejected since " + | ||
| s"the executor (${id.executorId}) has been lost") | ||
|
|
||
| // Use "invalid" as the return executor id to indicate the block manager that | ||
| // re-registration failed. It's a bit hacky but fine since the returned block | ||
| // manager id won't be accessed in the case of re-registration. And we'll use | ||
| // this "invalid" executor id to print better logs and avoid blocks reporting. | ||
| BlockManagerId( | ||
| BlockManagerId.INVALID_EXECUTOR_ID, | ||
| id.host, | ||
| id.port, | ||
| id.topologyInfo) | ||
| } else { | ||
| id | ||
| } | ||
| updatedId | ||
| } | ||
|
|
||
| private def updateShuffleBlockInfo(blockId: BlockId, blockManagerId: BlockManagerId) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -63,7 +63,8 @@ private[spark] object BlockManagerMessages { | |
| localDirs: Array[String], | ||
| maxOnHeapMemSize: Long, | ||
| maxOffHeapMemSize: Long, | ||
| sender: RpcEndpointRef) | ||
| sender: RpcEndpointRef, | ||
| isReRegister: Boolean) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure why and how but seems like branch-3.3 complains about this in MiMa binary compatibility test (https://github.com/apache/spark/actions/runs/3683054520/jobs/6231260546).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let me just make a quick followup to fix this in all branches. This was detected with Scala 2.13 FWIW.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here: #39052
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @HyukjinKwon |
||
| extends ToBlockManagerMaster | ||
|
|
||
| case class UpdateBlockInfo( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to terminate in case of
INVALID_EXECUTOR_ID?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. So ideally, a lost executor should be terminated in the end anyways (whether killed by the driver or exit itself proactively)...if the lost executor fails to terminate, there must be something wrong with it. So I think terminate here won't make any difference to the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is failing to terminate, every heartbeat will end up returning
BlockManagerId.INVALID_EXECUTOR_IDfrom driver right ?Wondering if there is a reason to keep it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note - if we do change this - it is introduction of a new code path.
So I dont want to block the PR on this discussion - but would be good to understand this case better :-)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case we met is that the executor failed to be killed by both
StopExecutorandExecutorRunner.killProcess. After second thinking, maybe killing (viaSystem.exit) from inside would give it one more chance to save this case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the termination logic here. cc @mridulm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Ngone51, for our spark structured streaming jobs, we have seen a lot of executor getting killed by this termination logic with error code -1. which will eventually kill the driver due to the
spark.max.executor.failuresconfigurations. We've been seeing this very frequent for a streaming job that perform aggressive scaling as they mark a lot of executors idle at the same time. Just want to ask why are we using -1 error code here if the exeuctor is already killing itself?