Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical variables cause NaN values. #51

Open
JelkeW opened this issue Nov 19, 2024 · 0 comments
Open

Categorical variables cause NaN values. #51

JelkeW opened this issue Nov 19, 2024 · 0 comments

Comments

@JelkeW
Copy link

JelkeW commented Nov 19, 2024

Hi,
I had an issue with the resampled data containing lots of NaN values and thus SMOGN not running.
For anyone who is familiar with it: oops! synthetic data contains missing values
During debugging I figured out, that the NaN values only occur on categorical variables.

Two fixes for anyone encountering the problem:

  1. Fix on data side
    Change all the column in your dataframe from type category to type object
    data[column] = data[column].astype("object")
  2. Fix on SMOGN side
    In smogn.over_sampling change
    nom_dtypes = ["object", "bool", "datetime64"]
    to
    nom_dtypes = ["object", "bool", "datetime64", "category"]

Took me a bit of time to figure it out. Hope it helps 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant