Skip to content

Allow mixed number types for Comparable/selecting statistics #1113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Jolanrensen opened this issue Apr 1, 2025 · 1 comment
Open

Allow mixed number types for Comparable/selecting statistics #1113

Jolanrensen opened this issue Apr 1, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented Apr 1, 2025

Related to #961, #1108.

After #1108 is merged, min/max aggregations will only support operations on values that are self-comparable. This includes Dates, specific numbers, strings, etc.

Other aggregations, like mean and sum support calculating statistics with number unification, so calculating the sum of an Int-column, a Double-column, and a column containing both floats and integers is no problem whatsoever. The result will always be Double.

min/max currently throws an exception when this is done. This is (sort-of) inline with the Kotlin stdlib, as you can only calculate the min/max of a self-comparable iterable as well. However, from a user-perspective, when dealing with data of many types, it's obvious what columnOf(1, 2.0, 3.0f).min() would return (1!) and they might be surprised when it doesn't work. describe() actually has a workaround for this.

I previously thought it was impossible due to overload resolution ambiguity, however, it's possible to create 3 overloads for each function like:

  1. fun <T : Comparable<T & Any>?> DataColumn<T>.min(): T & Any for normal comparables

  2. fun <T> DataColumn<T>.min(): T & Any where T : Number?, T : Comparable<T & Any?> for normal numbers

  3. fun <T : Number?> DataColumn<T>.min(): T & Any for mixed number types

We might also need two new aggregator handlers:

  • an input handler that allows either self-comparables or numbers
  • a selecting-like aggregation handler that functions like aggregateBy in the sense that it returns the item at indexOfAggregationResultSingleSequence by default, such that the original type is preserved while the aggregation result is decided by the unified numbers.

The same will hold for median and percentile. These functions will already need to be split into 1. and 2. overloads because they have different return types. Adding a type 3. overload for mixed number types will not be much more difficult.

@Jolanrensen Jolanrensen added the enhancement New feature or request label Apr 1, 2025
@Jolanrensen Jolanrensen changed the title Allow mixed number types for min/max statistics Allow mixed number types for Comparable/selecting statistics Apr 8, 2025
@Jolanrensen
Copy link
Collaborator Author

At least, it works if https://youtrack.jetbrains.com/issue/KT-76683 is fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant