You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Pandas and Cupy don't handle None in the integer type arrays and to satisfy this behavior cudf fills None with -1. But with ongoing porting with unsigned dtypes this will be a problem.
Describe the solution you'd like
Pandas 1.0 has nullable integer type, so we could depend on that and wait.
(Note: Currently we are dependent on numba to convert the device array to host array, not sure how to handle None here)
Or we can use max of that dtype or 0 for unsigned, similar to -1 used for signed integer.
Open for discussion and suggestions.
The text was updated successfully, but these errors were encountered:
Here we use the mask to figure out where to place the resulting Nones. In 1.0+, the plan is to more or less do this with pd.NA. In this case we just allow Numba to do whatever it wants with those values, and mask them out later.
In general, I think to_pandas should result in a pandas object with the appropriate nullable datatype (pd.Int64Dtype, pd.BooleanDtype, etc). However doing this cleanly might involve using those nullable types as the dtype of our cuDF objects, and that would be a fairly sweeping change in which we might as well just invent our own cuDF dtype.
@kkraus14 suggested the idea to route our to_pandas and from_pandas through Arrow, so that to_pandas() basically does to_arrow().to_pandas(). This frees us up to worry only about how to translate cuDF objects to Arrow objects. If/when Arrow decides to convert to Pandas nullable types, we get that "for free".
Is your feature request related to a problem? Please describe.
Pandas and Cupy don't handle
None
in the integer type arrays and to satisfy this behavior cudf fills None with-1
. But with ongoing porting with unsigned dtypes this will be a problem.Describe the solution you'd like
Pandas 1.0 has nullable integer type, so we could depend on that and wait.
(Note: Currently we are dependent on numba to convert the device array to host array, not sure how to handle
None
here)Or we can use max of that dtype or
0
for unsigned, similar to-1
used for signed integer.Open for discussion and suggestions.
The text was updated successfully, but these errors were encountered: