OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) #1185

LennartPurucker · 2022-03-22T14:07:34Z

Description

For a small set of flows, the predictions.arff files of some runs contain faulty entries. In these entries, the prediction does not correspond to the class with the highest confidence.

As far as I was able to find out, all affected flows are sklearn pipelines and published/uploaded using openml-python.
Moreover, the confidences of these pipelines should be, to the best of my knowledge, representative for the prediction (unlike, for example, the confidences of SVM).
Furthermore, the confidences are off by a large margin. This is not a result of two or more classes having almost equal confidences or a precision problem.

Example

Flow 19039 with Run 10581112 and the associated predictions file.

row_id	predicted class in predictions.arff	confidence.1	confidence.2	prediction based on confidence
95	1	0.2552	0.7448	2
349	1	0.0601	0.9399	2
980	2	0.6280	0.3720	1

Expected Results

The predictions in the predictions.arff should correspond to the class with the highest confidence in the predictions.arff.

Actual Results

The predictions in the predictions.arff correspond to the class with the second highest confidence. In other cases, the prediction does not correspond to a high-confidence class at all but seems to be chosen at random.

Affected Flows

In my research, I have found the following list of flows to run into this problem at least once: [19030, 19037, 19039, 19035, 18818, 17839, 17761].
These include sklearn pipelines using decision trees (19030, 18818), Gradient Boosting (19307,19039), KNN (19035), SGD (17839), and LDA (17761).

Versions

I assume that the flows [19030, 19037, 19039, 1903] used the newest version of openml-python based on their upload date and feedback gather by the original uploader. For the other flows, I am not certain which version was used.

The text was updated successfully, but these errors were encountered:

LennartPurucker · 2023-02-24T10:05:20Z

The reasons for the corrupted files were most likely fixed in openml/openml-python#1209.

@PGijsbers should we close this or let this stay open until the runs on the server have been updated?

PGijsbers · 2023-02-24T10:26:52Z

TODO:
For each run uploaded by OpenML Python check if the columns were swapped, and if so swap them back and overwrite the old arff file.

mfeurer added the bug label Feb 20, 2023

LennartPurucker mentioned this issue Feb 20, 2023

Fix: correctly order the ground truth and prediction for ARFF files in run.data_content openml/openml-python#1209

Merged

PGijsbers transferred this issue from openml/openml-python Feb 24, 2023

PGijsbers changed the title ~~Mismatches between the Confidences and the Prediction of a Flow in Predictions.arff Files~~ OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) Feb 24, 2023

PGijsbers added the High priority label Feb 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) #1185

OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) #1185

LennartPurucker commented Mar 22, 2022

LennartPurucker commented Feb 24, 2023

PGijsbers commented Feb 24, 2023

OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) #1185

OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) #1185

Comments

LennartPurucker commented Mar 22, 2022

Description

Example

Expected Results

Actual Results

Affected Flows

Versions

LennartPurucker commented Feb 24, 2023

PGijsbers commented Feb 24, 2023