-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Refactor aggregate expression serde #1380
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1380 +/- ##
=============================================
- Coverage 56.12% 39.18% -16.95%
- Complexity 976 2071 +1095
=============================================
Files 119 265 +146
Lines 11743 60904 +49161
Branches 2251 12935 +10684
=============================================
+ Hits 6591 23866 +17275
- Misses 4012 32553 +28541
- Partials 1140 4485 +3345 ☔ View full report in Codecov by Sentry. |
|
||
object CometMin extends CometAggregateExpressionSerde { | ||
|
||
override def convert( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is a copy and no changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exact that the supported type check is now contained in the convert method instead of being in the match arm, as explained in #1380 (comment)
It's hard to spot the two functional changes made in this PR because of the large amount of code moved. Can you tag the places where the changes were made? |
Sure, functional change #1 moved some of the pre-condition checks. For example, for case min @ Min(child) if minMaxDataTypeSupported(min.dataType) => And now we have: case _: Min => CometMin The pre-condition check is now moved to: object CometMin extends CometAggregateExpressionSerde {
override def convert(...) {
if (!AggSerde.minMaxDataTypeSupported(expr.dataType)) {
withInfo(aggExpr, s"Unsupported data type: ${expr.dataType}")
return None
}
}
} Functional change #2 tightened up the checks for supported types: Example for Before, we said that we support all numeric types, including fractional types: private def sumDataTypeSupported(dt: DataType): Boolean = {
dt match {
case _: NumericType => true
case _ => false
}
} After: def sumDataTypeSupported(dt: DataType): Boolean = {
dt match {
case ByteType | ShortType | IntegerType | LongType => true
case FloatType | DoubleType => true
case _: DecimalType => true
case _ => false
}
} |
Which issue does this PR close?
Part of #1345
Rationale for this change
Refactor in preparation for improving type checking and testing for aggregate expressions.
What changes are included in this PR?
Move aggregate expression serde into individual classes
This is mostly just moving code around. There are only two functional changes:
case max @ Max(child) if minMaxDataTypeSupported(max.dataType) =>
which meant that if the type was not supported then we would fall through to thecase _
arm which would reportunsupported Spark aggregate function
, which is misleading. We now do the type checks withing the aggregate serde logic and reportunsupported data type
for the aggregate instead.NumericType
rather than the specific types that we actually support, so I made this more explicit. We do not supportFractionalType
, for example, and this is a child ofNumericType
.How are these changes tested?
Existing tests.