Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error : An issue when running EDA. could not convert string to float: 'virginica' #511

Closed
nyan314sn opened this issue Jan 23, 2022 · 4 comments
Milestone

Comments

@nyan314sn
Copy link

I tried to follow this mutliclass classification example https://github.com/mljar/mljar-examples/blob/master/Iris_classification/Iris_classification.ipynb

I am getting an error message called "supervised.preprocessing.eda ERROR There was an issue when running EDA. could not convert string to float: 'virginica'"

I notice that EDA folder is empty as well.

Per Issue 508, I was under the impression that MLJAR will handle the Y column as categorical since the column is string.

Is it necessary that the user set order or unordered category per this pandas documentation ?

data = datasets.load_iris()

X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target").map({i:v for i, v in enumerate(data["target_names"])})

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3)


automl = AutoML(total_time_limit=5*60)
automl.fit(X_train, y_train)

y_predicted = automl.predict(X_test)

result = pd.DataFrame({"Predicted": y_predicted, "Target": np.array(y_test)})
filtro = result.Predicted == result.Target
print(filtro.value_counts(normalize=True))
@pplonski
Copy link
Contributor

@nyan314sn thank you for reporting. You can disable EDA by setting explain_level=1 in AutoML contructor

@JonasDHomburg
Copy link

The problem is introduced in this commit: 5116d03

The error is raised by Seaborn: https://github.com/mwaskom/seaborn/blob/master/seaborn/categorical.py#L431

It can be solved if this line: https://github.com/mljar/mljar-supervised/blob/v0.11.0/supervised/preprocessing/eda.py#L79
is changed to: sns.countplot(x=y, color=BLUE)

I'm not exactly sure if that will create the desired behavior for the mljar framework but it works and created an acceptable plot for me.

@pplonski
Copy link
Contributor

pplonski commented Feb 6, 2022

Hi @JonasDHomburg,

thank you for looking into this! This might fix the problem.

However, I'm thinking about disabling EDA for AutoML. Personally, for EDA I'm using pandas_profiling and I think this package should be used for EDA.

Thank you!

@pplonski pplonski added this to the 0.11.1 milestone Feb 14, 2022
pplonski added a commit that referenced this issue Feb 16, 2022
@pplonski
Copy link
Contributor

The EDA is disabled in AutoML. I strongly recommend using pandas_profiling or sweetviz for auto-EDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants