Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Apr 2, 2024

What changes were proposed in this pull request?

The pr aims to:

  • migrate logError in module Resource managers with variables to structured logging framework.
  • add some test case to *LoggingSuite
  • add new UT MDCSuite
  • make the type of value from String to Any( so that when using MDC, some code can be omitted, eg: String.valueOf(...) , x.toString)

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass GA.
  • Add & update new UT.

Was this patch authored or co-authored using generative AI tooling?

No.

* part of the ThreadContext.
*/
case class MDC(key: LogKey.Value, value: String)
case class MDC(key: LogKey, value: Any)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the type of value from String to Any, so that when using MDC, some code can be omitted, eg: String.valueOf(...) , x.toString

@panbingkun
Copy link
Contributor Author

cc @gengliangwang

@panbingkun panbingkun marked this pull request as ready for review April 2, 2024 03:28

def expectedPatternForBasicMsg(level: Level): String

def expectedPatternForBasicMsgWithException(level: Level): String
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new test for basicMsg + Exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's add comments about each pattern is about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

*/
object LogKey extends Enumeration {
val EXECUTOR_ID, MIN_SIZE, MAX_SIZE = Value
val APPLICATION_ID, APPLICATION_STATE, BUCKET, CONTAINER_ID, EXECUTOR_ID, POD_ID = Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each enum can take one line.
val APPLICATION_ID = Value
val APPLICATION_STATE = Value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

val MIN_SIZE, MAX_SIZE, MAX_EXECUTOR_FAILURES = Value
val EXIT_CODE = Value

type LogKey = Value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I want to use 'xxx: LogKey' instead of 'xxx: LogKey.Value' to define a type, it feels like it's a personal preference. Should we restore it?

object TaskLocality extends Enumeration {
// Process local is expected to be used ONLY within TaskSetManager for now.
val PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY = Value
type TaskLocality = Value
def isAllowed(constraint: TaskLocality, condition: TaskLocality): Boolean = {
condition <= constraint
}
}


test("check MDC message") {
val log = log"This is a log, exitcode ${MDC(EXIT_CODE, 10086)}"
assert(log.message === "This is a log, exitcode 10086")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's verify the returned map as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

import org.apache.spark.deploy.k8s.KubernetesConf
import org.apache.spark.deploy.k8s.KubernetesUtils.addOwnerReference
import org.apache.spark.internal.Logging
import org.apache.spark.internal.{Logging, LogKey => LK, MDC}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why alias as LK? LogKey is fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@gengliangwang
Copy link
Member

@panbingkun Thanks for the work. LGTM except some minor comments.

@panbingkun
Copy link
Contributor Author

I will update soon so that we can proceed smoothly with the following work.

@panbingkun
Copy link
Contributor Author

@panbingkun Thanks for the work. LGTM except some minor comments.

Done.

} else {
logError(s"Driver terminated with exit code ${exitCode}! Shutting down. $remoteAddress")
logError(log"Driver terminated with exit code ${MDC(EXIT_CODE, exitCode)}! " +
log"Shutting down. ${MDC(REMOTE_ADDRESS, remoteAddress)}")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang
Perhaps it would be more reasonable to write this in this way

logError(log"Driver terminated with exit code ${MDC(EXIT_CODE, exitCode)}! " + log"Shutting down. $remoteAddress")

Not every variable needs to be recorded in the field content of JSON
Unfortunately, we currently do not support this syntax as above, it will fail to compile.
So I submitted a separate PR to address this issue: #45813

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@panbingkun Let's put all the variables into the context during the migration. If the context is too verbose, we can configure log4j to only show a subset of keys later.
Otherwise, there will be back-and-forth during the migration. remoteAddress can be helpful for debugging here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay

@gengliangwang
Copy link
Member

Thanks, merging to master

case e: Exception =>
logError(s"Cannot get the creationTimestamp of the pod: ${state.pod}", e)
logError(log"Cannot get the creationTimestamp of the pod: " +
log"${MDC(LogKey.POD_ID, state.pod)}", e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO POD_ID is not a suitable key here. state.pod actually outputs a whole Pod Spec, and there is no such a "pod id" concept in the K8s, in practice, we use namespace and pod name to identify a pod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793 Thanks for the suggestion. Do you mind creating a PR to improve this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to have a try, it may take a few hours to learn this new log framework.

Copy link
Member

@gengliangwang gengliangwang Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inital PR of the framework is in #45729. Thank you in advance!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793 Thank you very much for correcting this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants