Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Apr 5, 2024

What changes were proposed in this pull request?

Migrate logError with variables of core module to structured logging framework. This is part2 which transforms the logError entries of the following API

def logError(msg: => String, throwable: Throwable): Unit

to

def logError(entry: LogEntry, throwable: Throwable): Unit

migration Part1 was in #45834

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

Yes, Spark core logs will contain additional MDC

How was this patch tested?

Compiler and scala style checks, as well as code review.

Was this patch authored or co-authored using generative AI tooling?

No

@gengliangwang
Copy link
Member Author

cc @dtenedor @panbingkun @pan3793 @itholic as well

val SHUFFLE_ID = Value
val SHUFFLE_MERGE_ID = Value
val SIZE = Value
val SLEEP_TIME_SECONDS = Value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the suffix _SECONDS?
Will we encounter SLEEP_TIME_MILLISECONDS like this in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this when there is SLEEP_TIME_MILLISECONDS

if (sc != null) {
logError(s"uncaught error in thread $currentThreadName, stopping SparkContext", t)
logError(
log"uncaught error in thread ${MDC(THREAD, currentThreadName)}, stopping SparkContext",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we have THREAD_ID in the future? THREAD_NAME?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will update this.

}
if (!NonFatal(t)) {
logError(s"throw uncaught fatal error in thread $currentThreadName", t)
logError(log"throw uncaught fatal error in thread ${MDC(THREAD, currentThreadName)}", t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

case NonFatal(t) =>
logError(s"Uncaught exception in thread ${Thread.currentThread().getName}", t)
logError(
log"Uncaught exception in thread ${MDC(THREAD, Thread.currentThread().getName)}", t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

exception: Throwable): Unit = {
logError(s"Failed to get the meta of push-merged block for ($shuffleId, $reduceId) " +
s"from ${req.address.host}:${req.address.port}", exception)
logError(log"Failed to get the meta of push-merged block for " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not related to this PR, do we need shuffleMergeId in the log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well let's just keep it.

s"unrecoverable fetch failures! Most likely this means user code is incorrectly " +
s"swallowing Spark's internal ${classOf[FetchFailedException]}", fetchFailure)
logError(log"${LogMDC(TASK_NAME, taskName)} completed successfully though internally " +
log"it encountered unrecoverable fetch failures! Most likely this means user code " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we correct the original appearance of the two whitespaces before Most?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

if (failures < executorStateSyncMaxAttempts) {
logError(s"Failed to send $newState to Master $masterRef, " +
s"will retry ($failures/$executorStateSyncMaxAttempts).", t)
logError(log"Failed to send ${MDC(EXECUTOR_STATE_CHANGED, newState)}" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if EXECUTOR_STATE_CHANGE or EXECUTOR_STATE is better

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, the PR description seems to be wrong to me, @gengliangwang . logInfo is a typo, right? :)

Screenshot 2024-04-05 at 07 57 29

@gengliangwang
Copy link
Member Author

@dongjoon-hyun Yes, updated :)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM from my side.

@dongjoon-hyun
Copy link
Member

Thank you! Please add the existing comments first, @gengliangwang .

@xinrong-meng
Copy link
Member

nit: pr description says "part1" which is inconsistent with the pr title :)

@gengliangwang
Copy link
Member Author

@xinrong-meng Thanks, I updated the PR description. Seems I made two mistakes when creating a PR at night time...

@gengliangwang
Copy link
Member Author

@panbingkun @dongjoon-hyun @xinrong-meng Thanks for the review, merging to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants