-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52065][SQL] Produce another plan tree with output columns (name, data type, nullability) in plan change logging #50852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cc. @cloud-fan PTAL, thanks! |
@cloud-fan Friendly reminder - I know you are busy with release, so just in case if you have time. |
cc. @cloud-fan Friendly reminder to see your availability. |
@@ -62,6 +62,9 @@ class PlanChangeLogger[TreeType <: TreeNode[_]] extends Logging { | |||
log""" | |||
|=== Applying Rule ${MDC(RULE_NAME, ruleName)} === | |||
|${MDC(QUERY_PLAN, sideBySide(oldPlan.treeString, newPlan.treeString).mkString("\n"))} | |||
| | |||
|Output Information: | |||
|${MDC(QUERY_PLAN, newPlan.treeStringWithOutputColumns)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
treeString
is also a public method, can we just call it with printOutputColumns = true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a new section with new plan only, shall we change sideBySide(oldPlan.treeString, newPlan.treeString)
to use the new string with output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's too long, verbose is not an optional param, so we need to specify both verbose
and printOutputColumns
. I'm OK if the length does not matter. Let me change it and revert if you see it be too long.
I'll make a change to do sideBySide here. Great suggestion!
@@ -1011,6 +1017,10 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] | |||
*/ | |||
def innerChildren: Seq[TreeNode[_]] = Seq.empty | |||
|
|||
def nodeWithOutputColumnsString(maxColumns: Int): String = { | |||
throw new UnsupportedOperationException("TreeNode does not have output columns") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be conservative shall we just call simpleString
here?
Thanks @cloud-fan for your review! I've addressed your comments. Please take another look when you have time. Thanks! |
log""" | ||
|=== Applying Rule ${MDC(RULE_NAME, ruleName)} === | ||
|${MDC(QUERY_PLAN, sideBySide(oldPlan.treeString, newPlan.treeString).mkString("\n"))} | ||
| | ||
|Output Information: | ||
|${MDC(QUERY_PLAN, sideBySide(oldPlan.treeString(verbose = false, printOutputColumns = true), newPlan.treeString(verbose = false, printOutputColumns = true)))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
val oldPlanStringWithOutput = ...
val newPlanStringWithOutput
log"""
...
... sideBySide(oldPlanStringWithOutput, newPlanStringWithOutput)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the PR description to reflect this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Didn't realize the message() method itself is lazily evaluated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also updated the PR description.
@cloud-fan I've addressed your feedback. PTAL, thanks! |
printOutputColumns = true) | ||
val newPlanStringWithOutput = newPlan.treeString(verbose = false, | ||
printOutputColumns = true) | ||
// scalastyle:off line.size.limit | ||
log""" | ||
|=== Result of Batch ${MDC(BATCH_NAME, batchName)} === | ||
|${MDC(QUERY_PLAN, sideBySide(oldPlan.treeString, newPlan.treeString).mkString("\n"))} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we append the output info in the verbose treeString? Is the diff readable?
What changes were proposed in this pull request?
We propose to add another tree string which focuses to produce output columns with data type and nullability. This will be shown in plan change logger, along with existing tree string plan.
For example, here is a one of example from plan change logging:
In some cases, it's not even feasible to evaluate the output of the node. (e.g. Project with Star expression) In that case, we will simply put
<output=unresolved>
since it's mostly due to UnresolvedException.For example,
Why are the changes needed?
We recently got into very tricky issue (nullability change broke stateful operator) which required custom debug logging on plan change logging. This is because of lack of visibility for the output columns, especially their nullability, in our tree string of the plan.
Ideally, we shouldn't have two different tree strings and just make a fix to the existing tree string, but in many cases, current tree string is long enough so that we had to restrict the number of fields to show, hence we think it's better to have a separate tree plan for it.
Does this PR introduce any user-facing change?
Yes, when they change SQL config for plan change logger log level to their visible log level in log4j2 config. The application of this change is at least opt-in instead of opt-out.
(If we are changing the existing tree string, it will change many places.)
How was this patch tested?
Modified UT to cover the change.
Was this patch authored or co-authored using generative AI tooling?
No.