Error : An issue when running EDA. could not convert string to float: 'virginica' #511

nyan314sn · 2022-01-23T23:57:41Z

I tried to follow this mutliclass classification example https://github.com/mljar/mljar-examples/blob/master/Iris_classification/Iris_classification.ipynb

I am getting an error message called "supervised.preprocessing.eda ERROR There was an issue when running EDA. could not convert string to float: 'virginica'"

I notice that EDA folder is empty as well.

Per Issue 508, I was under the impression that MLJAR will handle the Y column as categorical since the column is string.

Is it necessary that the user set order or unordered category per this pandas documentation ?

data = datasets.load_iris()

X = pd.DataFrame(data["data"], columns=data["feature_names"])
y = pd.Series(data["target"], name="target").map({i:v for i, v in enumerate(data["target_names"])})

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.3)


automl = AutoML(total_time_limit=5*60)
automl.fit(X_train, y_train)

y_predicted = automl.predict(X_test)

result = pd.DataFrame({"Predicted": y_predicted, "Target": np.array(y_test)})
filtro = result.Predicted == result.Target
print(filtro.value_counts(normalize=True))

The text was updated successfully, but these errors were encountered:

pplonski · 2022-01-24T12:52:52Z

@nyan314sn thank you for reporting. You can disable EDA by setting explain_level=1 in AutoML contructor

JonasDHomburg · 2022-02-02T10:55:08Z

The problem is introduced in this commit: 5116d03

The error is raised by Seaborn: https://github.com/mwaskom/seaborn/blob/master/seaborn/categorical.py#L431

It can be solved if this line: https://github.com/mljar/mljar-supervised/blob/v0.11.0/supervised/preprocessing/eda.py#L79
is changed to: sns.countplot(x=y, color=BLUE)

I'm not exactly sure if that will create the desired behavior for the mljar framework but it works and created an acceptable plot for me.

pplonski · 2022-02-06T17:40:58Z

Hi @JonasDHomburg,

thank you for looking into this! This might fix the problem.

However, I'm thinking about disabling EDA for AutoML. Personally, for EDA I'm using pandas_profiling and I think this package should be used for EDA.

Thank you!

Disabled EDA (#511)

pplonski · 2022-02-16T09:46:39Z

The EDA is disabled in AutoML. I strongly recommend using pandas_profiling or sweetviz for auto-EDA.

pplonski added this to the 0.11.1 milestone Feb 14, 2022

pplonski added a commit that referenced this issue Feb 16, 2022

Merge pull request #517 from MaciekEO/master

ec309aa

Disabled EDA (#511)

pplonski closed this as completed Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error : An issue when running EDA. could not convert string to float: 'virginica' #511

Error : An issue when running EDA. could not convert string to float: 'virginica' #511

nyan314sn commented Jan 23, 2022

pplonski commented Jan 24, 2022

JonasDHomburg commented Feb 2, 2022

pplonski commented Feb 6, 2022

pplonski commented Feb 16, 2022

Error : An issue when running EDA. could not convert string to float: 'virginica' #511

Error : An issue when running EDA. could not convert string to float: 'virginica' #511

Comments

nyan314sn commented Jan 23, 2022

pplonski commented Jan 24, 2022

JonasDHomburg commented Feb 2, 2022

pplonski commented Feb 6, 2022

pplonski commented Feb 16, 2022