Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209

LennartPurucker · 2023-02-20T16:48:44Z

Bug - What does this PR fix?

The order of the ground truth and predictions is mixed up in the current implementation of how a run stores prediction data and how it uses the prediction data to build an ARFF file.

As a result, the ground truth is treated as the predictions, and the predictions are treated as the ground truth.
Consequently, publishing a run to uploads the wrong values for these columns to the OpenML server.

Impact

This is not validated on the server side (to my understanding). Hence, all ARFF files of predictions uploaded using the OpenML Python API are most likely wrong. Moreover, evaluations of such runs also report the wrong scores. This might have impacted the results of papers that used scores of runs uploaded by the Python Client for meta-analysis.

Reference Issues

Multiple issues exist as a result of this. The following issues are likely related to this problem: #1197, ~~#559~~, openml/OpenML#1185

Fix

I changed the order in the appropiate places and added a test for the (IMO) expected/correct behavior. Additionally, I changed the tests that checked for the old order to the new order.

New Order

The order I used follows the order of ARFF files uploaded by the R Client API. I used the following code snippet to find the order.

from openml.runs import get_run

# 1875133 is a run from mlr 
print(get_run(1875133).predictions_url)

# After downloading and opening the file stored under the predictions_url, one can see that the file's order is:
# @attribute 'repeat' numeric
# @attribute 'fold' numeric
# @attribute 'row_id' numeric
# @attribute 'prediction' {'1','2'}
# @attribute 'truth' {'1','2'}
# @attribute 'confidence.1' numeric
# @attribute 'confidence.2' numeric

Open Questions

~~The contribution guidelines mention that I should change the progress.rst. I am unsure if this would be below 13.1 and what I should mention there. What do you think?~~ I updated the progress.rst for the newest version.

Required (Server-Side) Follow-up Actions

This bug affects a lot of already published runs on OpenML.
We might need to change/adjust the uploaded ARFF files and re-evaluate all these runs.

tests/test_runs/test_run.py

new unit test for run consistency and bug fixed in read from xml

mfeurer

Looks good to me, let's wait for the unit tests to work again and then merge this.

PGijsbers · 2023-02-22T18:11:12Z

doc/progress.rst

@@ -9,8 +9,7 @@ Changelog
 0.13.1
 ~~~~~~

- * Add new contributions here.
-
+ * FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the ``OpenMLRun`` object and in ``format_prediction``.


Can you add that it is specifically about regression tasks? thanks

The switched order is a problem for regression and classification tasks (and maybe learning curve).

openml/runs/functions.py

tests/test_runs/test_run.py

Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>

* Add sklearn marker * Mark tests that use scikit-learn * Only run scikit-learn tests multiple times The generic tests that don't use scikit-learn should only be tested once (per platform). * Rename for correct variable * Add sklearn mark for filesystem test * Remove quotes around sklearn * Instead include sklearn in the matrix definition * Update jobnames * Add explicit false to jobname * Remove space * Add function inside of expression? * Do string testing instead * Add missing ${{ * Add explicit true to old sklearn tests * Add instruction to add pytest marker for sklearn tests

…o develop # Conflicts: # tests/test_runs/test_run.py

…lt of the random state problems for sklearn < 0.24

LennartPurucker added 5 commits February 20, 2023 15:59

add test and fix for switch of ground truth and predictions

4956a51

undo import optimization

fc642c1

fix bug with model passing to function

2da1109

fix order in other tests

0583668

Merge branch 'openml:develop' into develop

1fe8bc9

LennartPurucker marked this pull request as ready for review February 20, 2023 16:49

update progress.rst

14cbd04

mfeurer reviewed Feb 21, 2023

View reviewed changes

tests/test_runs/test_run.py Outdated Show resolved Hide resolved

tests/test_runs/test_run.py Show resolved Hide resolved

tests/test_runs/test_run.py Show resolved Hide resolved

LennartPurucker added 6 commits February 21, 2023 10:44

new unit test for run consistency and bug fixed

ceb1d53

clarify new assert

37500a7

Merge pull request #1 from LennartPurucker/develop_ext

921cf10

new unit test for run consistency and bug fixed in read from xml

Merge branch 'openml:develop' into develop

3e97992

minor loop refactor

9f47b91

Merge remote-tracking branch 'origin/develop' into develop

14d4299

mfeurer approved these changes Feb 22, 2023

View reviewed changes

PGijsbers reviewed Feb 22, 2023

View reviewed changes

openml/runs/functions.py Outdated Show resolved Hide resolved

refactor default to None

8686317

PGijsbers reviewed Feb 23, 2023

View reviewed changes

tests/test_runs/test_run.py Outdated Show resolved Hide resolved

PGijsbers reviewed Feb 23, 2023

View reviewed changes

tests/test_runs/test_run.py Outdated Show resolved Hide resolved

LennartPurucker and others added 2 commits February 23, 2023 09:34

directly test prediction data equal

8adb0bd

Update tests/test_runs/test_run.py

04ca611

Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>

PGijsbers approved these changes Feb 23, 2023

View reviewed changes

LennartPurucker and others added 7 commits February 23, 2023 14:42

Merge branch 'develop' into develop

f996c0a

add test and fix for switch of ground truth and predictions

1bf8c0e

undo import optimization

74e9c38

Merge branch 'develop' of https://github.com/openml/openml-python int…

794cce8

…o develop # Conflicts: # tests/test_runs/test_run.py

fix mask error resulting from rebase

b4c2030

make dummy classifier strategy consistent to avoid problems as a resu…

3c5ff3e

…lt of the random state problems for sklearn < 0.24

mfeurer approved these changes Feb 24, 2023

View reviewed changes

mfeurer merged commit bbf09b3 into openml:develop Feb 24, 2023

This was referenced Feb 24, 2023

Can't upload rmse,mae and other results to openml server #1197

Closed

OpenML Python runs may have swapped truth and prediction labels (at least for classification, regression) openml/OpenML#1185

Open

float precision #559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209

Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209

LennartPurucker commented Feb 20, 2023 •

edited

Loading

mfeurer left a comment

PGijsbers Feb 22, 2023

LennartPurucker Feb 22, 2023

Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209

Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209

Conversation

LennartPurucker commented Feb 20, 2023 • edited Loading

Bug - What does this PR fix?

Impact

Reference Issues

Fix

New Order

Open Questions

Required (Server-Side) Follow-up Actions

mfeurer left a comment

Choose a reason for hiding this comment

PGijsbers Feb 22, 2023

Choose a reason for hiding this comment

LennartPurucker Feb 22, 2023

Choose a reason for hiding this comment

LennartPurucker commented Feb 20, 2023 •

edited

Loading