Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209

Merged
merged 22 commits into from
Feb 24, 2023
Merged

Conversation

LennartPurucker
Copy link
Contributor

@LennartPurucker LennartPurucker commented Feb 20, 2023

Bug - What does this PR fix?

The order of the ground truth and predictions is mixed up in the current implementation of how a run stores prediction data and how it uses the prediction data to build an ARFF file.

As a result, the ground truth is treated as the predictions, and the predictions are treated as the ground truth.
Consequently, publishing a run to uploads the wrong values for these columns to the OpenML server.

Impact

This is not validated on the server side (to my understanding). Hence, all ARFF files of predictions uploaded using the OpenML Python API are most likely wrong. Moreover, evaluations of such runs also report the wrong scores. This might have impacted the results of papers that used scores of runs uploaded by the Python Client for meta-analysis.

Reference Issues

Multiple issues exist as a result of this. The following issues are likely related to this problem: #1197, #559, openml/OpenML#1185

Fix

I changed the order in the appropiate places and added a test for the (IMO) expected/correct behavior. Additionally, I changed the tests that checked for the old order to the new order.

New Order

The order I used follows the order of ARFF files uploaded by the R Client API. I used the following code snippet to find the order.

from openml.runs import get_run

# 1875133 is a run from mlr 
print(get_run(1875133).predictions_url)

# After downloading and opening the file stored under the predictions_url, one can see that the file's order is:
# @attribute 'repeat' numeric
# @attribute 'fold' numeric
# @attribute 'row_id' numeric
# @attribute 'prediction' {'1','2'}
# @attribute 'truth' {'1','2'}
# @attribute 'confidence.1' numeric
# @attribute 'confidence.2' numeric

Open Questions

The contribution guidelines mention that I should change the progress.rst. I am unsure if this would be below 13.1 and what I should mention there. What do you think? I updated the progress.rst for the newest version.

Required (Server-Side) Follow-up Actions

This bug affects a lot of already published runs on OpenML.
We might need to change/adjust the uploaded ARFF files and re-evaluate all these runs.

@LennartPurucker LennartPurucker marked this pull request as ready for review February 20, 2023 16:49
tests/test_runs/test_run.py Outdated Show resolved Hide resolved
tests/test_runs/test_run.py Show resolved Hide resolved
tests/test_runs/test_run.py Show resolved Hide resolved
Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, let's wait for the unit tests to work again and then merge this.

@@ -9,8 +9,7 @@ Changelog
0.13.1
~~~~~~

* Add new contributions here.

* FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the ``OpenMLRun`` object and in ``format_prediction``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add that it is specifically about regression tasks? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switched order is a problem for regression and classification tasks (and maybe learning curve).

openml/runs/functions.py Outdated Show resolved Hide resolved
LennartPurucker and others added 2 commits February 23, 2023 09:34
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
LennartPurucker and others added 7 commits February 23, 2023 14:42
* Add sklearn marker

* Mark tests that use scikit-learn

* Only run scikit-learn tests multiple times

The generic tests that don't use scikit-learn should only be tested once
(per platform).

* Rename for correct variable

* Add sklearn mark for filesystem test

* Remove quotes around sklearn

* Instead include sklearn in the matrix definition

* Update jobnames

* Add explicit false to jobname

* Remove space

* Add function inside of expression?

* Do string testing instead

* Add missing ${{

* Add explicit true to old sklearn tests

* Add instruction to add pytest marker for sklearn tests
…o develop

# Conflicts:
#	tests/test_runs/test_run.py
…lt of the random state problems for sklearn < 0.24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants