Question Paper #583
Replies: 2 comments 1 reply
-
When running experiments, sometimes AutoML frameworks experience failures. You may encounter something like this in your results file (simplified for readability):
Here, NaiveAutoML for whatever reason failed to create predictions and no result is available. However, when we compare results across tasks and folds, such as when creating the critical difference diagrams, we need some performance measure. Concretely, when creating critical difference plots we first rank each framework by their mean score on each task. Now we have to decide how we calculate a mean score for NaiveAutoML on To get scores for our constant predictor, we use scikit-learn's Dummy Classifier and Dummy Regressor. It simply predicts the mean response (regression) or the empirical class probabilities of the training data (classification). We run that on each (task, fold). We can then find the score of the constant predictor on fold 1 of
The constant predictor has an AUC of 0.5 for fold 1 of kick (as expected), so now we impute the result of the Naive AutoML experiment on fold 1 of |
Beta Was this translation helpful? Give feedback.
-
Thanks! So one more question, if even using the constraintpredictor, it continues to show errors. Did you impute the value of the task as 0? |
Beta Was this translation helpful? Give feedback.
-
Hello, I didn't quite understand this excerpt from the article: "Instead, we impute missing values with the constant predictor, or prior. This baseline returns the empirical class distribution for classification and the empirical mean for regression. This is a very penalizing imputation strategy, as the constant predictor is often much worse than results obtained by the AutoML frameworks that produce predictions for the task or fold. However, we feel this penalty for ill-behaved systems is appropriate and fairer towards the well-behaved frameworks and hope that it encourages a standard of robust, well-behaved AutoML frameworks." Could you explain better how this imputation was performed? Thank you in advance. I'm analyzing my experiments, and it returned missing values in terms of failures. I wanted to know better how you handled it.
Beta Was this translation helpful? Give feedback.
All reactions