You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I had an issue with the resampled data containing lots of NaN values and thus SMOGN not running.
For anyone who is familiar with it: oops! synthetic data contains missing values
During debugging I figured out, that the NaN values only occur on categorical variables.
Two fixes for anyone encountering the problem:
Fix on data side
Change all the column in your dataframe from type category to type object data[column] = data[column].astype("object")
Fix on SMOGN side
In smogn.over_sampling change nom_dtypes = ["object", "bool", "datetime64"]
to nom_dtypes = ["object", "bool", "datetime64", "category"]
Took me a bit of time to figure it out. Hope it helps 😊
The text was updated successfully, but these errors were encountered:
Hi,
I had an issue with the resampled data containing lots of NaN values and thus SMOGN not running.
For anyone who is familiar with it:
oops! synthetic data contains missing values
During debugging I figured out, that the NaN values only occur on categorical variables.
Two fixes for anyone encountering the problem:
Change all the column in your dataframe from type
category
to typeobject
data[column] = data[column].astype("object")
In
smogn.over_sampling
changenom_dtypes = ["object", "bool", "datetime64"]
to
nom_dtypes = ["object", "bool", "datetime64", "category"]
Took me a bit of time to figure it out. Hope it helps 😊
The text was updated successfully, but these errors were encountered: