[Fea] Data imputation limited by null conversion #2966
Labels
bug
Something isn't working
Cython / Python
Cython or Python issue
feature request
New feature or request
Is your feature request related to a problem? Please describe.
In sklearn, a fairly common data imputation workflow might look something like this
The Rapids equivalent would looks something like:
Under the hood, we try to convert the cudf DataFrame to a cupy array, which fails because of null values in the DataFrame. This severely limits the usefulness of our data imputation methods.
Describe the solution you'd like
We can fix this either in cuml through special handling of DataFrame input or in cudf by providing some infrastructure for dealing with null values when we convert to cupy, though that may also require cupy changes (possibly related: cudf/5754).
Describe alternatives you've considered
For floating point data, we can use
fillna(cp.nan)
before running data imputation. For integers, we would have to either know of an integer value which cannot appear in the data or generate one.The text was updated successfully, but these errors were encountered: