-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46294][SQL] Clean up semantics of init vs zero value #44222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46294][SQL] Clean up semantics of init vs zero value #44222
Conversation
| // values before calculate max, min, etc. | ||
| private[this] var _value = initValue | ||
| private var _zeroValue = initValue | ||
| class SQLMetric(val metricType: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class SQLMetric(val metricType: String, | |
| class SQLMetric( | |
| val metricType: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 spaces indentation for multi-line parameters
| // This is used to filter out metrics. Metrics with value equal to initValue should | ||
| // be filtered out, since they are either invalid or safe to filter without changing | ||
| // the aggregation defined in [[SQLMetrics.stringValue]]. | ||
| override def isZero: Boolean = _value == initValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit tricky as isZero is not true when we actually have the zero value...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's enrich the comment to highlight that, we may want to collect the 0 value for calculating min/max/avg. We can still link to SPARK-11013.
|
thanks, merging to master! |
What changes were proposed in this pull request?
Cleaning up the semantics of init and zero value to the following. This also helps define what an "invalid" metric is.
initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue().
zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in SPARK-41442).
For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. Whenever an invalid metric is updated, it sets itself to zeroValue and becomes valid. Invalid metrics will be filtered out when calculating min, max, etc. as a workaround for SPARK-11013.
Why are the changes needed?
The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.
Does this PR introduce any user-facing change?
No. This shouldn't change any behavior.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.