[DISCUSSION] `None` conversion to pandas #5388

rgsl888prabhu · 2020-06-04T18:40:51Z

Is your feature request related to a problem? Please describe.
Pandas and Cupy don't handle None in the integer type arrays and to satisfy this behavior cudf fills None with -1. But with ongoing porting with unsigned dtypes this will be a problem.

Describe the solution you'd like

Pandas 1.0 has nullable integer type, so we could depend on that and wait.
(Note: Currently we are dependent on numba to convert the device array to host array, not sure how to handle None here)
Or we can use max of that dtype or 0 for unsigned, similar to -1 used for signed integer.

Open for discussion and suggestions.

The text was updated successfully, but these errors were encountered:

brandon-b-miller · 2020-06-04T19:14:56Z

The plan for Pandas 1.0 nullable integer support, at least so far, is to do something like this:

https://github.com/rapidsai/cudf/blob/branch-0.15/python/cudf/cudf/core/column/column.py#L124-L135

Here we use the mask to figure out where to place the resulting Nones. In 1.0+, the plan is to more or less do this with pd.NA. In this case we just allow Numba to do whatever it wants with those values, and mask them out later.

In general, I think to_pandas should result in a pandas object with the appropriate nullable datatype (pd.Int64Dtype, pd.BooleanDtype, etc). However doing this cleanly might involve using those nullable types as the dtype of our cuDF objects, and that would be a fairly sweeping change in which we might as well just invent our own cuDF dtype.

shwina · 2020-06-12T13:10:09Z

@kkraus14 suggested the idea to route our to_pandas and from_pandas through Arrow, so that to_pandas() basically does to_arrow().to_pandas(). This frees us up to worry only about how to translate cuDF objects to Arrow objects. If/when Arrow decides to convert to Pandas nullable types, we get that "for free".

kkraus14 · 2020-09-23T14:48:20Z

Defer to #5754 instead

rgsl888prabhu added feature request New feature or request Needs Triage Need team to review and classify and removed Needs Triage Need team to review and classify labels Jun 4, 2020

rgsl888prabhu mentioned this issue Jun 11, 2020

[REVIEW] Adding support for unsigned int #5431

Merged

kkraus14 added the Python Affects Python cuDF API. label Jun 15, 2020

kkraus14 closed this as completed Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DISCUSSION] `None` conversion to pandas #5388

[DISCUSSION] `None` conversion to pandas #5388

rgsl888prabhu commented Jun 4, 2020

brandon-b-miller commented Jun 4, 2020 •

edited

Loading

shwina commented Jun 12, 2020

kkraus14 commented Sep 23, 2020

[DISCUSSION] None conversion to pandas #5388

[DISCUSSION] None conversion to pandas #5388

Comments

rgsl888prabhu commented Jun 4, 2020

brandon-b-miller commented Jun 4, 2020 • edited Loading

shwina commented Jun 12, 2020

kkraus14 commented Sep 23, 2020

[DISCUSSION] `None` conversion to pandas #5388

[DISCUSSION] `None` conversion to pandas #5388

brandon-b-miller commented Jun 4, 2020 •

edited

Loading