feat: Support covar_samp and covar_pop #216

huaxingao · 2024-03-19T21:36:44Z

Which issue does this PR close?

Closes #.

Rationale for this change

This PR adds the support for covar_samp and covar_pop

What changes are included in this PR?

How are these changes tested?

new tests

viirya · 2024-03-20T00:53:51Z

common/src/main/scala/org/apache/spark/sql/comet/util/Utils.scala

@@ -69,6 +69,7 @@ object Utils {
    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 2 => ShortType
    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 4 => IntegerType
    case int: ArrowType.Int if int.getIsSigned && int.getBitWidth == 8 * 8 => LongType
+    case int: ArrowType.Int if int.getBitWidth == 8 * 8 => LongType


Hmm, is this UInt64? Using LongType to represent it will overflow, I think.

Shall we map to DecimalType instead?

Where do you use it?

This UInt64 is for state field count.

If we have both partial and final aggregation operators in Comet, it should be okay as Java doesn't process the intermediate results (state), but if we have only partial aggregation in Comet, this Uint64 array as LongType will possibly cause overflow in corner case, I think.

Shall we map to DecimalType(20, 0) instead?

Hmm, how does it work? You will treat UInt64 array as decimal array?

Since the max of UInt64 is 18446744073709551615, I guess we can use DecimalType(20, 0) to represent the number without overflowing?

Yes, but I mean Spark will treat the actual UInt64 array as decimal one. For example, it will call getDecimal on an UInt64 array, does it work?

Maybe we should enable partial + final Comet aggregation as a whole, i.e., #223.

viirya · 2024-03-20T00:54:05Z

common/src/main/scala/org/apache/comet/vector/NativeUtil.scala

@@ -205,7 +205,7 @@ class NativeUtil {
      case v @ (_: BitVector | _: TinyIntVector | _: SmallIntVector | _: IntVector |
          _: BigIntVector | _: Float4Vector | _: Float8Vector | _: VarCharVector |
          _: DecimalVector | _: DateDayVector | _: TimeStampMicroTZVector | _: VarBinaryVector |
-          _: FixedSizeBinaryVector | _: TimeStampMicroVector) =>
+          _: FixedSizeBinaryVector | _: TimeStampMicroVector | _: UInt8Vector) =>


Why we need to handle UInt8Vector?

Is it used in Covariance/CovariancePop state types?

I think it's for state field count

I think it is UInt64? But what you add is UInt8Vector?

Seems UInt8Vector is for UInt64. The name is confusing. https://github.com/apache/arrow/blob/main/java/vector/src/main/java/org/apache/arrow/vector/UInt8Vector.java#L37

Oh...the naming of Java Arrow API...

huaxingao · 2024-03-26T23:45:03Z

Seems we can't use covariance from DataFusion, because DataFusion has UInt64 for state_fields count, but Spark has Double for count. I will close this PR and implement Comet's own covariance.

feat: Support covar_samp and covar_pop

2d36f0e

viirya reviewed Mar 20, 2024

View reviewed changes

viirya mentioned this pull request Mar 26, 2024

Enable Comet aggregation (partition + final) as a whole #223

Closed

huaxingao closed this Mar 26, 2024

huaxingao deleted the covariance branch March 26, 2024 23:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support covar_samp and covar_pop #216

feat: Support covar_samp and covar_pop #216

huaxingao commented Mar 19, 2024

viirya Mar 20, 2024

huaxingao Mar 20, 2024

viirya Mar 20, 2024

huaxingao Mar 20, 2024

viirya Mar 20, 2024

huaxingao Mar 20, 2024

viirya Mar 20, 2024

huaxingao Mar 21, 2024

viirya Mar 21, 2024

viirya Mar 21, 2024 •

edited

Loading

viirya Mar 20, 2024

viirya Mar 20, 2024

huaxingao Mar 20, 2024

viirya Mar 20, 2024

huaxingao Mar 20, 2024

viirya Mar 20, 2024

huaxingao commented Mar 26, 2024

feat: Support covar_samp and covar_pop #216

feat: Support covar_samp and covar_pop #216

Conversation

huaxingao commented Mar 19, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huaxingao commented Mar 26, 2024

viirya Mar 21, 2024 •

edited

Loading