Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂ Statistics streamlining #961

Open
Jolanrensen opened this issue Nov 21, 2024 · 0 comments
Open

☂ Statistics streamlining #961

Jolanrensen opened this issue Nov 21, 2024 · 0 comments
Assignees
Labels
bug Something isn't working ☂ umbrella issue Label assigned to issues that are collections of smaller issues
Milestone

Comments

@Jolanrensen
Copy link
Collaborator

Jolanrensen commented Nov 21, 2024

Continuation of #558 which fixed the most annoying bugs related to describe.

See #558 for more information.

Our statistics functions need some more love. We used to have many missing types (mostly fixed by #937), but there are yet some more inconsistencies to be solved:

As mentioned here #543, some functions like median(ints) might result in an unexpectedly rounded Int in return. It might be better to let all functions return Double and then handle BigInteger / BigDecimal separately for now, as they're java-specific for now.

There are plenty of public overloads on Iterable and Sequence. It's fine to have them internally, but I feel like we're clogging the public scope here. mean, for instance, is already covered in the stdlib.

We'll need to hide public functions that are not on DataColumn as @AndreiKingsley will probably make a statistics library for that anyway.

We need to honor some conversion table (see below)

Function Conversion extra information nulls in input
mean Int -> Double All nulls are filtered out
Short -> Double
Byte -> Double
Long -> Double
Double -> Double skipNaN option, false by default
Float -> Double skipNaN option, false by default
BigInteger -> BigDecimal? null instead of NaN in output
BigDecimal -> BigDecimal? null instead of NaN in output
Number -> Double skipNaN option, false by default
Nothing / no values -> Double (NaN)
sum Int -> Int All default to zero if no values All nulls are filtered out
Short -> Int
Byte -> Int
Long -> Long
Double -> Double skipNaN option, false by default
Float -> Float skipNaN option, false by default
BigInteger -> BigInteger
BigDecimal -> BigDecimal
Number -> Double skipNaN option, false by default
Nothing / no values -> Double (0.0)
cumSum Int -> Int All default to zero if no values All can optionally skip nulls in input with skipNull option, true by default
Short -> Int important because order matters with cumSum
Byte -> Int
Long -> Long
Double -> Double skipNaN option, true by default
Float -> Float skipNaN option, true by default
BigInteger -> BigInteger
BigDecimal -> BigDecimal
Number -> Double skipNaN option, true by default
Nothing / no values -> Double (0.0)
min/max T -> T? where T : Comparable<T> For all: null if no elements All nulls are filtered out
Int -> Int?
Short -> Short?
Byte -> Byte?
Long -> Long?
Double -> Double? If has NaN, result will be NaN, needs skipNaN option?
Float -> Float? If has NaN, result will be NaN, needs skipNaN option?
BigInteger -> BigInteger?
BigDecimal -> BigDecimal?
Number -> Double? If has NaN, result will be NaN, needs skipNaN option?
Nothing / no values -> Double? (null)
(Don't convert Short/Byte to Int!)
median T -> T? where T : Comparable<T> For all: median of even list will cause conversion to Double All nulls are filtered out
Int -> Double? and null if no elements
Short -> Double?
Byte -> Double?
Long -> Double?
Double -> Double?
Float -> Double?
BigInteger -> BigDecimal?
BigDecimal -> BigDecimal?
Number -> Double?
Nothing / no values -> Double? (null)
std Int -> Double All have DDoF (Delta Degrees of Freedom) argument All nulls are filtered out
Short -> Double
Byte -> Double
Long -> Double
Double -> Double skipNaN option, false by default
Float -> Double skipNaN option, false by default
BigInteger -> BigDecimal? null instead of NaN in output
BigDecimal -> BigDecimal? null instead of NaN in output
Number -> Double skipNaN option, false by default
Nothing / no values -> Double (NaN)
var (want to add?) same as std
@Jolanrensen Jolanrensen added bug Something isn't working ☂ umbrella issue Label assigned to issues that are collections of smaller issues labels Nov 21, 2024
@Jolanrensen Jolanrensen added this to the 0.16.0 milestone Nov 21, 2024
@Jolanrensen Jolanrensen self-assigned this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ☂ umbrella issue Label assigned to issues that are collections of smaller issues
Projects
None yet
Development

No branches or pull requests

1 participant