☂ Statistics streamlining

Continuation of https://github.com/Kotlin/dataframe/issues/558 which fixed the most annoying bugs related to `describe`.

See https://github.com/Kotlin/dataframe/issues/558  for more information.

Our statistics functions need some more love. We used to have many missing types (mostly fixed by https://github.com/Kotlin/dataframe/pull/937), but there are yet some more inconsistencies to be solved:

> As mentioned here https://github.com/Kotlin/dataframe/issues/543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return `Double` and then handle `BigInteger` / `BigDecimal` separately for now, as they're java-specific [for now](https://youtrack.jetbrains.com/issue/KT-20912).

> ~There are plenty of public overloads on `Iterable` and `Sequence`. It's fine to have them internally, but I feel like we're clogging the public scope here. mean, for instance, is already covered in the stdlib.~

> ~We'll need to hide public functions that are not on DataColumn as @AndreiKingsley will probably make a statistics library for that anyway.~

>  We need to honor some conversion table (see below)

We won't support `UByte`, `UShort`, `UInt`, and `ULong` since they don't inherit `Number`.

We also drop support for `BigNumber` and `BigDecimal` as this makes generic typing and conversion very difficult and unpredictable.

Progress:
- [x] underlying fixes https://github.com/Kotlin/dataframe/pull/1078
- [x] mean https://github.com/Kotlin/dataframe/pull/1091
- [x] sum https://github.com/Kotlin/dataframe/pull/1103
- [x] min https://github.com/Kotlin/dataframe/pull/1108
- [x] max https://github.com/Kotlin/dataframe/pull/1108
- [x] std https://github.com/Kotlin/dataframe/pull/1119
- [x] median https://github.com/Kotlin/dataframe/pull/1122
- [x] percentile https://github.com/Kotlin/dataframe/pull/1149
- [x] cumSum https://github.com/Kotlin/dataframe/pull/1152

| Function                  | Conversion                                             | extra information                                                                               | nulls in input                                                                   |
|---------------------------|--------------------------------------------------------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| **mean**                  | Int -> Double                                          | **For all: Double.NaN if no elements**                                                          | **All nulls are filtered out**                                                   |
|                           | Short -> Double                                        |                                                                                                 |                                                                                  |
|                           | Byte -> Double                                         |                                                                                                 |                                                                                  |
|                           | Long -> Double                                         |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, false by default                                                                |                                                                                  |
|                           | Float -> Double                                        | skipNaN option, false by default                                                                |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Double     | skipNaN option, false by default                                                                |                                                                                  |
|                           | Nothing / no values -> Double.NaN                      |                                                                                                 |                                                                                  |
| **sum**                   | Int -> Int                                             | **All default to zero if no values**                                                            | **All nulls are filtered out**                                                   |
|                           | Short -> Int                                           |                                                                                                 |                                                                                  |
|                           | Byte -> Int                                            |                                                                                                 |                                                                                  |
|                           | Long -> Long                                           |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, false by default                                                                |                                                                                  |
|                           | Float -> Float                                         | skipNaN option, false by default                                                                |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Number     | skipNaN option, false by default                                                                |                                                                                  |
|                           | Nothing / no values -> Double (0.0)                    |                                                                                                 |                                                                                  |
| **cumSum**                | Int -> Int                                             | **All default to zero if no values**                                                            | **All can optionally skip nulls in input with skipNull option**, true by default |
|                           | Short -> Int                                           |                                                                                                 | **important because order matters with cumSum**                                  |
|                           | Byte -> Int                                            |                                                                                                 |                                                                                  |
|                           | Long -> Long                                           |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, true by default                                                                 |                                                                                  |
|                           | Float -> Float                                         | skipNaN option, true by default                                                                 |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Number     | skipNaN option, true by default                                                                 |                                                                                  |
|                           | Nothing / no values -> Double (0.0)                    |                                                                                                 |                                                                                  |
| **min/max**               | T -> T? where T : Comparable\<T\>                      | **For all: null if no elements, has -OrNull overloads**                                         | **All nulls are filtered out**                                                   |
|                           | Int -> Int?                                            |                                                                                                 |                                                                                  |
|                           | Short -> Short?                                        |                                                                                                 |                                                                                  |
|                           | Byte -> Byte?                                          |                                                                                                 |                                                                                  |
|                           | Long -> Long?                                          |                                                                                                 |                                                                                  |
|                           | Double -> Double?                                      | skipNaN option, false by default, returns NaN when in the input                                 |                                                                                  |
|                           | Float -> Float?                                        | skipNaN option, false by default, returns NaN when in the input                                 |                                                                                  |
|                           | ~~Number -> Number?~~                                  | Would need more overloads and more work                                                         |                                                                                  |
|                           | Nothing / no values -> Nothing? (null)                 |                                                                                                 |                                                                                  |
| **median**/**percentile** | T -> T? where T : Comparable\<T\>                      | **For all: median of even list will cause conversion to Double if possible, else lower middle** | **All nulls are filtered out**                                                   |
|                           | Int -> Double?                                         | **null if no elements**                                                                         |                                                                                  |
|                           | Short -> Double?                                       |                                                                                                 |                                                                                  |
|                           | Byte -> Double?                                        |                                                                                                 |                                                                                  |
|                           | Long -> Double?                                        |                                                                                                 |                                                                                  |
|                           | Double -> Double?                                      |                                                                                                 |                                                                                  |
|                           | Float -> Double?                                       |                                                                                                 |                                                                                  |
|                           | ~~Number -> Conversion(Common number type) -> Double~~ | Would need more overloads and more work                                                         |                                                                                  |
|                           | Nothing / no values -> Nothing? (null)                 |                                                                                                 |                                                                                  |
| **std**                   | Int -> Double                                          | **All have DDoF (Delta Degrees of Freedom) argument**                                           | **All nulls are filtered out**                                                   |
|                           | Short -> Double                                        | **and Double.NaN if no elements**                                                               |                                                                                  |
|                           | Byte -> Double                                         |                                                                                                 |                                                                                  |
|                           | Long -> Double                                         |                                                                                                 |                                                                                  |
|                           | Double -> Double                                       | skipNaN option, false by default                                                                |                                                                                  |
|                           | Float -> Double                                        | skipNaN option, false by default                                                                |                                                                                  |
|                           | Number -> Conversion(Common number type) -> Double     | skipNaN option, false by default                                                                |                                                                                  |
|                           | Nothing / no values -> Double.NaN                      |                                                                                                 |                                                                                  |
| **var** (want to add?)    | same as std                                            |                                                                                                 |                                                                                  |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

☂ Statistics streamlining #961

6 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Function	Conversion	extra information	nulls in input
mean	Int -> Double	For all: Double.NaN if no elements	All nulls are filtered out
	Short -> Double
	Byte -> Double
	Long -> Double
	Double -> Double	skipNaN option, false by default
	Float -> Double	skipNaN option, false by default
	Number -> Conversion(Common number type) -> Double	skipNaN option, false by default
	Nothing / no values -> Double.NaN
sum	Int -> Int	All default to zero if no values	All nulls are filtered out
	Short -> Int
	Byte -> Int
	Long -> Long
	Double -> Double	skipNaN option, false by default
	Float -> Float	skipNaN option, false by default
	Number -> Conversion(Common number type) -> Number	skipNaN option, false by default
	Nothing / no values -> Double (0.0)
cumSum	Int -> Int	All default to zero if no values	All can optionally skip nulls in input with skipNull option, true by default
	Short -> Int		important because order matters with cumSum
	Byte -> Int
	Long -> Long
	Double -> Double	skipNaN option, true by default
	Float -> Float	skipNaN option, true by default
	Number -> Conversion(Common number type) -> Number	skipNaN option, true by default
	Nothing / no values -> Double (0.0)
min/max	T -> T? where T : Comparable<T>	For all: null if no elements, has -OrNull overloads	All nulls are filtered out
	Int -> Int?
	Short -> Short?
	Byte -> Byte?
	Long -> Long?
	Double -> Double?	skipNaN option, false by default, returns NaN when in the input
	Float -> Float?	skipNaN option, false by default, returns NaN when in the input
	~~Number -> Number?~~	Would need more overloads and more work
	Nothing / no values -> Nothing? (null)
median/percentile	T -> T? where T : Comparable<T>	For all: median of even list will cause conversion to Double if possible, else lower middle	All nulls are filtered out
	Int -> Double?	null if no elements
	Short -> Double?
	Byte -> Double?
	Long -> Double?
	Double -> Double?
	Float -> Double?
	~~Number -> Conversion(Common number type) -> Double~~	Would need more overloads and more work
	Nothing / no values -> Nothing? (null)
std	Int -> Double	All have DDoF (Delta Degrees of Freedom) argument	All nulls are filtered out
	Short -> Double	and Double.NaN if no elements
	Byte -> Double
	Long -> Double
	Double -> Double	skipNaN option, false by default
	Float -> Double	skipNaN option, false by default
	Number -> Conversion(Common number type) -> Double	skipNaN option, false by default
	Nothing / no values -> Double.NaN
var (want to add?)	same as std

☂ Statistics streamlining #961

Description

Activity

Jolanrensen commented on Jan 22, 2025

Jolanrensen commented on Feb 14, 2025

AndreiKingsley commented on Feb 14, 2025

Jolanrensen commented on Feb 18, 2025

Jolanrensen commented on Feb 19, 2025

6 remaining items

Jolanrensen commented on Apr 9, 2025

Jolanrensen commented on Apr 22, 2025

Jolanrensen commented on Apr 22, 2025

Jolanrensen commented on Apr 27, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions