Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Categorify combo doesnt work on list columns #1676

Closed
bschifferer opened this issue Sep 9, 2022 · 2 comments · Fixed by #1685
Closed

[BUG] Categorify combo doesnt work on list columns #1676

bschifferer opened this issue Sep 9, 2022 · 2 comments · Fixed by #1685
Assignees
Labels
bug Something isn't working P1

Comments

@bschifferer
Copy link
Contributor

bschifferer commented Sep 9, 2022

Describe the bug
As a user, I want to jointly Categorify two columns, one is list and one is normal. Usecase - I have items interacted and one is the current item to predict and the list feature are the historic ones.

Error:

df = cudf.DataFrame({
    'col1': [0,1,2,3,4,5],
    'col2': [[0,1],[1,2],[2,3],[3,4],[4],[5]]
})
dataset = nvt.Dataset(df)
cols = [['col1', 'col2']] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()

Error:

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/common.py:1619, in _is_dtype_type(arr_or_dtype, condition)
   1615         return condition(type(None))
   1617     return False
-> 1619 return condition(tipo)

File /usr/local/lib/python3.8/dist-packages/pandas/core/dtypes/common.py:146, in classes.<locals>.<lambda>(tipo)
    144 def classes(*klasses) -> Callable:
    145     """evaluate if the tipo is a subclass of the klasses"""
--> 146     return lambda tipo: issubclass(tipo, klasses)

TypeError: issubclass() arg 1 must be a class

What works:
No joint categorify

import cudf
import nvtabular as nvt

df = cudf.DataFrame({
    'col1': [0,1,2,3,4,5],
    'col2': [[0,1],[1,2],[2,3],[3,4],[4],[5]]
})
dataset = nvt.Dataset(df)
cols = ['col1', 'col2'] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()

Joint Categoriy with non-list columns

import cudf
import nvtabular as nvt

df = cudf.DataFrame({
    'col1': [0,1,2,3,4,5],
    'col2':  [1,2,3,4,4,5],
})
dataset = nvt.Dataset(df)
cols = [['col1', 'col2']] >> nvt.ops.Categorify()
workflow = nvt.Workflow(cols)
workflow.fit(dataset)
workflow.transform(dataset).to_ddf().compute()```
@bschifferer bschifferer added bug Something isn't working P0 P1 and removed P0 labels Sep 9, 2022
@rnyak
Copy link
Contributor

rnyak commented Sep 12, 2022

@rjzamora hello. is this something you can take a look? thanks.

@rjzamora
Copy link
Collaborator

@rjzamora hello. is this something you can take a look? thanks.

Sorry for the delay - I can look into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants