Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Refactor aggregate expression serde #1380

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

andygrove
Copy link
Member

Which issue does this PR close?

Part of #1345

Rationale for this change

Refactor in preparation for improving type checking and testing for aggregate expressions.

What changes are included in this PR?

Move aggregate expression serde into individual classes

This is mostly just moving code around. There are only two functional changes:

  1. Previously, we had match arms such as case max @ Max(child) if minMaxDataTypeSupported(max.dataType) => which meant that if the type was not supported then we would fall through to the case _ arm which would report unsupported Spark aggregate function, which is misleading. We now do the type checks withing the aggregate serde logic and report unsupported data type for the aggregate instead.
  2. I made some small changes to the type checks because the existing code checked for NumericType rather than the specific types that we actually support, so I made this more explicit. We do not support FractionalType, for example, and this is a child of NumericType.

How are these changes tested?

Existing tests.

@andygrove andygrove marked this pull request as ready for review February 8, 2025 18:59
@codecov-commenter
Copy link

codecov-commenter commented Feb 8, 2025

Codecov Report

Attention: Patch coverage is 68.55670% with 122 lines in your changes missing coverage. Please review.

Project coverage is 39.18%. Comparing base (f09f8af) to head (2f71857).
Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
...main/scala/org/apache/comet/serde/aggregates.scala 67.58% 83 Missing and 35 partials ⚠️
.../scala/org/apache/comet/serde/QueryPlanSerde.scala 83.33% 0 Missing and 4 partials ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main    #1380       +/-   ##
=============================================
- Coverage     56.12%   39.18%   -16.95%     
- Complexity      976     2071     +1095     
=============================================
  Files           119      265      +146     
  Lines         11743    60904    +49161     
  Branches       2251    12935    +10684     
=============================================
+ Hits           6591    23866    +17275     
- Misses         4012    32553    +28541     
- Partials       1140     4485     +3345     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


object CometMin extends CometAggregateExpressionSerde {

override def convert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a copy and no changes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exact that the supported type check is now contained in the convert method instead of being in the match arm, as explained in #1380 (comment)

@parthchandra
Copy link
Contributor

It's hard to spot the two functional changes made in this PR because of the large amount of code moved. Can you tag the places where the changes were made?

@andygrove
Copy link
Member Author

andygrove commented Feb 12, 2025

It's hard to spot the two functional changes made in this PR because of the large amount of code moved. Can you tag the places where the changes were made?

Sure, functional change #1 moved some of the pre-condition checks. For example, for min, we originally had:

case min @ Min(child) if minMaxDataTypeSupported(min.dataType) =>

And now we have:

case _: Min => CometMin

The pre-condition check is now moved to:

object CometMin extends CometAggregateExpressionSerde {
  override def convert(...) {
    if (!AggSerde.minMaxDataTypeSupported(expr.dataType)) {
      withInfo(aggExpr, s"Unsupported data type: ${expr.dataType}")
      return None
    }
  }
}

Functional change #2 tightened up the checks for supported types:

Example for Sum:

Before, we said that we support all numeric types, including fractional types:

private def sumDataTypeSupported(dt: DataType): Boolean = {
     dt match {
       case _: NumericType => true
       case _ => false
     }
   }

After:

def sumDataTypeSupported(dt: DataType): Boolean = {
     dt match {
       case ByteType | ShortType | IntegerType | LongType => true
       case FloatType | DoubleType => true
       case _: DecimalType => true
       case _ => false
     }
   }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants