-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47602][CORE] Resource managers: Migrate logError with variables to structured logging framework #45808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…s to structured logging framework
| * part of the ThreadContext. | ||
| */ | ||
| case class MDC(key: LogKey.Value, value: String) | ||
| case class MDC(key: LogKey, value: Any) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the type of value from String to Any, so that when using MDC, some code can be omitted, eg: String.valueOf(...) , x.toString
|
|
||
| def expectedPatternForBasicMsg(level: Level): String | ||
|
|
||
| def expectedPatternForBasicMsgWithException(level: Level): String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new test for basicMsg + Exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's add comments about each pattern is about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
| */ | ||
| object LogKey extends Enumeration { | ||
| val EXECUTOR_ID, MIN_SIZE, MAX_SIZE = Value | ||
| val APPLICATION_ID, APPLICATION_STATE, BUCKET, CONTAINER_ID, EXECUTOR_ID, POD_ID = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each enum can take one line.
val APPLICATION_ID = Value
val APPLICATION_STATE = Value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
| val MIN_SIZE, MAX_SIZE, MAX_EXECUTOR_FAILURES = Value | ||
| val EXIT_CODE = Value | ||
|
|
||
| type LogKey = Value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I want to use 'xxx: LogKey' instead of 'xxx: LogKey.Value' to define a type, it feels like it's a personal preference. Should we restore it?
| object TaskLocality extends Enumeration { | |
| // Process local is expected to be used ONLY within TaskSetManager for now. | |
| val PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY = Value | |
| type TaskLocality = Value | |
| def isAllowed(constraint: TaskLocality, condition: TaskLocality): Boolean = { | |
| condition <= constraint | |
| } | |
| } |
|
|
||
| test("check MDC message") { | ||
| val log = log"This is a log, exitcode ${MDC(EXIT_CODE, 10086)}" | ||
| assert(log.message === "This is a log, exitcode 10086") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's verify the returned map as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
| import org.apache.spark.deploy.k8s.KubernetesConf | ||
| import org.apache.spark.deploy.k8s.KubernetesUtils.addOwnerReference | ||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.internal.{Logging, LogKey => LK, MDC} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why alias as LK? LogKey is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
|
@panbingkun Thanks for the work. LGTM except some minor comments. |
|
I will update |
Done. |
| } else { | ||
| logError(s"Driver terminated with exit code ${exitCode}! Shutting down. $remoteAddress") | ||
| logError(log"Driver terminated with exit code ${MDC(EXIT_CODE, exitCode)}! " + | ||
| log"Shutting down. ${MDC(REMOTE_ADDRESS, remoteAddress)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gengliangwang
Perhaps it would be more reasonable to write this in this way
logError(log"Driver terminated with exit code ${MDC(EXIT_CODE, exitCode)}! " + log"Shutting down. $remoteAddress")
Not every variable needs to be recorded in the field content of JSON
Unfortunately, we currently do not support this syntax as above, it will fail to compile.
So I submitted a separate PR to address this issue: #45813
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@panbingkun Let's put all the variables into the context during the migration. If the context is too verbose, we can configure log4j to only show a subset of keys later.
Otherwise, there will be back-and-forth during the migration. remoteAddress can be helpful for debugging here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
|
Thanks, merging to master |
| case e: Exception => | ||
| logError(s"Cannot get the creationTimestamp of the pod: ${state.pod}", e) | ||
| logError(log"Cannot get the creationTimestamp of the pod: " + | ||
| log"${MDC(LogKey.POD_ID, state.pod)}", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO POD_ID is not a suitable key here. state.pod actually outputs a whole Pod Spec, and there is no such a "pod id" concept in the K8s, in practice, we use namespace and pod name to identify a pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 Thanks for the suggestion. Do you mind creating a PR to improve this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to have a try, it may take a few hours to learn this new log framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inital PR of the framework is in #45729. Thank you in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pan3793 Thank you very much for correcting this issue.
What changes were proposed in this pull request?
The pr aims to:
logErrorin moduleResource managerswith variables tostructured logging framework.*LoggingSuiteMDCSuitetypeofvaluefromStringtoAny( so that when usingMDC, some code can beomitted, eg:String.valueOf(...),x.toString)Why are the changes needed?
To enhance Apache Spark's logging system by implementing structured logging.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.