Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] factorize returning incorrect results for CategoricalDtype #13979

Closed
galipremsagar opened this issue Aug 28, 2023 · 0 comments · Fixed by #13980
Closed

[BUG] factorize returning incorrect results for CategoricalDtype #13979

galipremsagar opened this issue Aug 28, 2023 · 0 comments · Fixed by #13980
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@galipremsagar
Copy link
Contributor

Describe the bug
When factorize is called with a CategoricalDtype, it always returns nulls.

Steps/Code to reproduce bug

In [1]: import cudf

In [2]: s = cudf.Series(['a', 'b', 'c', 'd'])

In [4]: s.factorize()
Out[4]: 
(array([0, 1, 2, 3], dtype=int8),
 StringIndex(['a' 'b' 'c' 'd'], dtype='object'))

In [5]: s = s.astype('category')

In [6]: s.factorize()
Out[6]: 
(array([-1, -1, -1, -1], dtype=int8),
 CategoricalIndex([<NA>, <NA>, <NA>, <NA>], categories=['a', 'b', 'c', 'd'], ordered=False, dtype='category'))

In [7]: s.to_pandas().factorize()
Out[7]: 
(array([0, 1, 2, 3]),
 CategoricalIndex(['a', 'b', 'c', 'd'], categories=['a', 'b', 'c', 'd'], ordered=False, dtype='category'))

Expected behavior

In [7]: s.factorize()
Out[7]: 
(array([0, 1, 2, 3]),
 CategoricalIndex(['a', 'b', 'c', 'd'], categories=['a', 'b', 'c', 'd'], ordered=False, dtype='category'))

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Python Affects Python cuDF API. labels Aug 28, 2023
@galipremsagar galipremsagar self-assigned this Aug 28, 2023
rapids-bot bot pushed a commit that referenced this issue Aug 28, 2023
…3980)

closes #13979 

This PR restores column type metadata for `dropna` call, absense of this restoration was causing an issue with the `CategoricalColumn.dropna` that was necessary for `factorize` API.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Bradley Dice (https://github.com/bdice)

URL: #13980
@github-project-automation github-project-automation bot moved this from In Progress to Done in cuDF/Dask/Numba/UCX Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant