-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: correctly order the ground truth and prediction for ARFF files in run.data_content #1209
Conversation
new unit test for run consistency and bug fixed in read from xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, let's wait for the unit tests to work again and then merge this.
@@ -9,8 +9,7 @@ Changelog | |||
0.13.1 | |||
~~~~~~ | |||
|
|||
* Add new contributions here. | |||
|
|||
* FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the ``OpenMLRun`` object and in ``format_prediction``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add that it is specifically about regression tasks? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switched order is a problem for regression and classification tasks (and maybe learning curve).
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>
* Add sklearn marker * Mark tests that use scikit-learn * Only run scikit-learn tests multiple times The generic tests that don't use scikit-learn should only be tested once (per platform). * Rename for correct variable * Add sklearn mark for filesystem test * Remove quotes around sklearn * Instead include sklearn in the matrix definition * Update jobnames * Add explicit false to jobname * Remove space * Add function inside of expression? * Do string testing instead * Add missing ${{ * Add explicit true to old sklearn tests * Add instruction to add pytest marker for sklearn tests
…o develop # Conflicts: # tests/test_runs/test_run.py
…lt of the random state problems for sklearn < 0.24
Bug - What does this PR fix?
The order of the ground truth and predictions is mixed up in the current implementation of how a run stores prediction data and how it uses the prediction data to build an ARFF file.
As a result, the ground truth is treated as the predictions, and the predictions are treated as the ground truth.
Consequently, publishing a run to uploads the wrong values for these columns to the OpenML server.
Impact
This is not validated on the server side (to my understanding). Hence, all ARFF files of predictions uploaded using the OpenML Python API are most likely wrong. Moreover, evaluations of such runs also report the wrong scores. This might have impacted the results of papers that used scores of runs uploaded by the Python Client for meta-analysis.
Reference Issues
Multiple issues exist as a result of this. The following issues are likely related to this problem: #1197,
#559, openml/OpenML#1185Fix
I changed the order in the appropiate places and added a test for the (IMO) expected/correct behavior. Additionally, I changed the tests that checked for the old order to the new order.
New Order
The order I used follows the order of ARFF files uploaded by the R Client API. I used the following code snippet to find the order.
Open Questions
The contribution guidelines mention that I should change the progress.rst. I am unsure if this would be below 13.1 and what I should mention there. What do you think?I updated the progress.rst for the newest version.Required (Server-Side) Follow-up Actions
This bug affects a lot of already published runs on OpenML.
We might need to change/adjust the uploaded ARFF files and re-evaluate all these runs.