Skip to content

Conversation

@davintjong-db
Copy link
Contributor

@davintjong-db davintjong-db commented Dec 7, 2023

What changes were proposed in this pull request?

Cleaning up the semantics of init and zero value to the following. This also helps define what an "invalid" metric is.

initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue().

zeroValue defines the lowest value considered valid. If a SQLMetric is invalid, it is set to zeroValue upon receiving any updates, and it also reports zeroValue as its value to avoid exposing it to the user programatically (concern previouosly addressed in SPARK-41442).

For many SQLMetrics, we use initValue = -1 and zeroValue = 0 to indicate that the metric is by default invalid. Whenever an invalid metric is updated, it sets itself to zeroValue and becomes valid. Invalid metrics will be filtered out when calculating min, max, etc. as a workaround for SPARK-11013.

Why are the changes needed?

The semantics of initValue and _zeroValue in SQLMetrics is a little bit confusing, since they effectively mean the same thing. Changing it to the following would be clearer, especially in terms of defining what an "invalid" metric is.

Does this PR introduce any user-facing change?

No. This shouldn't change any behavior.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Dec 7, 2023
@davintjong-db davintjong-db changed the title [WIP][SPARK-46294][SQL] Clean up semantics of init vs zero value [SPARK-46294][SQL] Clean up semantics of init vs zero value Dec 7, 2023
// values before calculate max, min, etc.
private[this] var _value = initValue
private var _zeroValue = initValue
class SQLMetric(val metricType: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SQLMetric(val metricType: String,
class SQLMetric(
val metricType: String,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 spaces indentation for multi-line parameters

// This is used to filter out metrics. Metrics with value equal to initValue should
// be filtered out, since they are either invalid or safe to filter without changing
// the aggregation defined in [[SQLMetrics.stringValue]].
override def isZero: Boolean = _value == initValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit tricky as isZero is not true when we actually have the zero value...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's enrich the comment to highlight that, we may want to collect the 0 value for calculating min/max/avg. We can still link to SPARK-11013.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants