-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47577][SPARK-47579] Correct misleading usage of log key TASK_ID #46951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
Outdated
Show resolved
Hide resolved
| // re-execute it. | ||
| logError(log"Task ${MDC(TASK_ID, info.id)} in stage ${MDC(STAGE_ID, taskSet.id)} " + | ||
| log"(TID ${MDC(TID, tid)}) can not write to output file: " + | ||
| logError(log"Task ${MDC(TASK_INFO_ID, info.id)} in stage ${MDC(STAGE_ID, taskSet.id)} " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the taskSet.id is s"$stageId.$stageAttemptId"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for cases like TASK_INFO_ID and TASK_SET_ID, I am wondering if we can just expose
- TASK_INDEX
- TASK_ATTEMP_NUM
- STAGE_ID
- STAGE_ATTEMP_ID
to the MDC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, updated.
| isExecutorDecommissioningOrDecommissioned(taskScheduler, bmAddress) | ||
| if (ignoreStageFailure) { | ||
| logInfo(log"Ignoring fetch failure from ${MDC(TASK_ID, task)} of " + | ||
| logInfo(log"Ignoring fetch failure from ${MDC(TASK_NAME, task)} of " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to call this TASK_NAME?
From the code below, it seems inconsistent:
| log"${MDC(TASK_NAME, taskName(attemptInfo.taskId))} on " + |
spark/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
Lines 583 to 587 in 33a9c5d
| def taskName(tid: Long): String = { | |
| val info = taskInfos.get(tid) | |
| assert(info.isDefined, s"Can not find TaskInfo for task (TID $tid)") | |
| s"task ${info.get.id} in stage ${taskSet.id} (TID $tid)" | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I can't find a better name here..
| log"${MDC(ERROR, ef.description)}; not retrying") | ||
| logError( | ||
| log"Task ${MDC(TASK_INDEX, info.index)}.${MDC(TASK_ATTEMPT_ID, info.attemptNumber)} " + | ||
| log"in stage ${MDC(STAGE_ID, taskSet.stageId)}." + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one only shows once...I was thinking about not having a new method
| override def toString: String = "TaskSet " + id | ||
|
|
||
| // Identifier used in the structured logging framework. | ||
| lazy val logId: MessageWithContext = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we intentionally not treat it as a log parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by "log parameter"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, it's a MessageWithContext
|
@pan3793 @panbingkun @cloud-fan Thanks for the review. |

What changes were proposed in this pull request?
Correct misleading usage of log key TASK_ID from #45834 and #46739
Why are the changes needed?
Provide more accurate log keys in the structure logging
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing GA tests
Was this patch authored or co-authored using generative AI tooling?
No