You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using cuml's ColumnTransformer with a cuDF.DataFrame as input, the function raises a KeyError: 'dtype' and subsequently an AttributeError: 'DataFrame' object has no attribute dtype. This issue seems to occur when a SimpleImputer is used within the ColumnTransformer pipeline on a cuDF.DataFrame.
Steps/Code to reproduce bug
Below is a minimal reproducible example:
Expected behavior
The ColumnTransformer should process the cuDF.DataFrame correctly without raising errors. Missing values should be imputed using the defined SimpleImputer strategies, and the output should be consistent with the column definitions in the ColumnTransformer.
Actual behavior
KeyError: 'dtype'
AttributeError: 'DataFrame' object has no attribute dtype
**Environment details (please complete the following information):
Environment location: Bare-metal
Linux Distro/Architecture: Ubuntu 20.04 amd64
GPU Model/Driver: NVIDIA A100 and driver 470.57
CUDA Version: 11.8
Method of cuML install: conda
conda list output
Output from conda list truncated for brevity
cudf 23.02
cuml 23.02
Additional context
This issue seems to be caused by an incompatibility between cuDF.DataFrame and the underlying sklearn-based SimpleImputer, which expects a pandas-like interface. Specifically:
The SimpleImputer attempts to access the dtype attribute, which is not directly available for cuDF.DataFrame.
The issue propagates due to the missing handling in the ColumnTransformer.
The text was updated successfully, but these errors were encountered:
Describe the bug
When using cuml's ColumnTransformer with a cuDF.DataFrame as input, the function raises a KeyError: 'dtype' and subsequently an AttributeError: 'DataFrame' object has no attribute dtype. This issue seems to occur when a SimpleImputer is used within the ColumnTransformer pipeline on a cuDF.DataFrame.
Steps/Code to reproduce bug
Below is a minimal reproducible example:
Expected behavior
The ColumnTransformer should process the cuDF.DataFrame correctly without raising errors. Missing values should be imputed using the defined SimpleImputer strategies, and the output should be consistent with the column definitions in the ColumnTransformer.
Actual behavior
KeyError: 'dtype'
AttributeError: 'DataFrame' object has no attribute dtype
**Environment details (please complete the following information):
Output from
conda list
truncated for brevitycudf 23.02
cuml 23.02
Error traceback
Additional context
This issue seems to be caused by an incompatibility between cuDF.DataFrame and the underlying sklearn-based SimpleImputer, which expects a pandas-like interface. Specifically:
The SimpleImputer attempts to access the dtype attribute, which is not directly available for cuDF.DataFrame.
The issue propagates due to the missing handling in the ColumnTransformer.
The text was updated successfully, but these errors were encountered: