Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Filter followed by Groupby throws TypeError for categorical column #1767

Closed
rcalland opened this issue Feb 19, 2023 · 0 comments · Fixed by NVIDIA-Merlin/core#226
Closed
Labels
bug Something isn't working P1

Comments

@rcalland
Copy link

Describe the bug
Applying filter followed by a groupby operation causes the following error in my environment:

TypeError: Dtype discrepancy detected for column category_list: operator Groupby reported dtype `int64` but returned dtype `float64`.

Steps/Code to reproduce bug

import pandas as pd
import nvtabular as nvt

# dummy data
_session = ["a", "a", "a", "b"]
_category = ["x", "x", "x", "y"]
_event_type = ["start", "start", "stop", "start"]
input_df = pd.DataFrame(
    {"session": _session, "category": _category, "event_type": _event_type}
)
print(input_df.head())

# graph
cat_feats = ["category"] >> nvt.ops.Categorify()

features = ["session", "event_type"] + cat_feats

# This is the problematic line
features = features >> nvt.ops.Filter(f=lambda df: df["event_type"] == "start")

groupby_features = features >> nvt.ops.Groupby(
    groupby_cols=["session"],
    aggs={
        "category": ["list", "count"],
        "event_type": ["list"],
    },
)

processor = nvt.Workflow(groupby_features)
dataset = nvt.Dataset(input_df)

output_df = processor.fit_transform(dataset)
print(output_df.head())

Expected behavior
The output dataframe should contain nested fields with rows filtered out depending on event_type.

Environment details (please complete the following information):

  • Environment location: Bare metal (Apple M1), Python 3.10.9
  • Method of NVTabular install: pip (version 1.8.1)

Additional context
I've just started using nvtabular, so its possible I am overlooking something. Would appreciate some help in understanding this error, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants