You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Data] Use Approximate Quantile for RobustScaler Preprocessor (#58371)
## Description
Currently Ray Data has a preprocessor called `RobustScaler`. This scales
the data based on given quantiles. Calculating the quantiles involves
sorting the entire dataset by column for each column (C sorts for C
number of columns), which, for a large dataset, will require a lot of
calculations.
** MAJOR EDIT **: had to replace the original `tdigest` with `ddsketch`
as I couldn't actually find well-maintained tdigest libraries for
python. ddsketch is better maintained.
** MAJOR EDIT 2 **: discussed offline to use `ApproximateQuantile`
aggregator
## Related issues
N/A
## Additional information
N/A
---------
Signed-off-by: kyuds <kyuseung1016@gmail.com>
Signed-off-by: Daniel Shin <kyuseung1016@gmail.com>
Co-authored-by: You-Cheng Lin <106612301+owenowenisme@users.noreply.github.com>
0 commit comments