-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression #560
Regression #560
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #560 +/- ##
===========================================
- Coverage 89.76% 89.59% -0.17%
===========================================
Files 32 32
Lines 3077 3133 +56
===========================================
+ Hits 2762 2807 +45
- Misses 315 326 +11
Continue to review full report at Codecov.
|
Hey @joaquinvanschoren any news on this? Do you think you can finish this in the first weeks of the new year? |
Update: I ran into a very strange error when I run the tests. There is a global variable 'cachedirectory' in config.py, and for some reason, in the regression branch it is set to the directory I run the tests from on the commandline instead of the default cache. It does not happen in the develop branch. Does that ring any bells? Otherwise, I'll just keep on looking. I should have some time next week to finish this. |
No, this never happened to me so far. You could check the file testing.py in the openml directory. It performs some directory operations prior to starting the actual test, and I expect that this is the most likely culprit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks a lot! This looks great! I think my comments are rather minor and we should be able to merge this rather soonish :)
openml/runs/functions.py
Outdated
|
||
if ProbaY.shape[1] != len(task.class_labels): | ||
warnings.warn("Repeat %d Fold %d: estimator only predicted for %d/%d classes!" % (rep_no, fold_no, ProbaY.shape[1], len(task.class_labels))) | ||
# TODO: Is it OK to move predict_proba outside of the runtime measurement? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me.
arff_dict['attributes'] = [('repeat', 'NUMERIC'), | ||
('fold', 'NUMERIC'), | ||
('row_id', 'NUMERIC'), | ||
('cluster', 'NUMERIC')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An else: raise NotImplementedError
would be great here.
openml/runs/run.py
Outdated
return arff_dict | ||
|
||
def get_metric_fn(self, sklearn_fn, kwargs={}): | ||
"""Calculates metric scores based on predicted values. Assumes the | ||
"""Calculates metric scores based on prnedicted values. Assumes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that you introduced a typo here ;)
openml/runs/run.py
Outdated
raise ValueError('Attribute "correct" should be set') | ||
if 'prediction' not in attribute_names: | ||
raise ValueError('Attribute "predict" should be set') | ||
if task.task_type_id == TaskTypeEnum.SUPERVISED_CLASSIFICATION and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also holds for learning curves, right?
# should take at least one millisecond (?) | ||
'usercpu_time_millis': (0, max_time_allowed)} | ||
|
||
print(task_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No print statements please.
|
||
print(task_type) | ||
|
||
if task_type == "Supervised Classification" or \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use the task type enum here as well?
task_type="Supervised Classification") | ||
pass | ||
|
||
def _run_and_upload_regression(self, clf, rsv): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great if you could rename that other function to _run_and_upload_classification
.
pass | ||
|
||
def _run_and_upload_regression(self, clf, rsv): | ||
def determine_grid_size(param_grid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this function be generalized as it is also used for classification?
run = self._perform_run(task_id, num_test_instances, clf, | ||
random_state_value=rsv) | ||
|
||
# obtain accuracy scores using get_metric_score: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename all references to classification (such as accuracy) to reference regression?
run.fold_evaluations['mean_absolute_error'][rep][fold]) | ||
self.assertEqual(sum(mae_scores_provided), sum(mae_scores)) | ||
|
||
if isinstance(clf, BaseSearchCV): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be generalized, too. After thinking about this, I think the whole function should be generalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done everything. Testing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks (almost) good to me. Could you please check my final comments?
I'd merge once they are resolved despite the failing unit tests since all changes since they were running were cosmetic.
if type(val_1) == type(val_2): | ||
self.assertEqual(val_1, val_2) | ||
elif type(val_1) == float or type(val_2) == float: | ||
self.assertTrue(abs(float(val_1) - float(val_2)) < 0.00001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace this by self.assertAlmostEqual
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. done.
elif type(val_1) == float or type(val_2) == float: | ||
self.assertTrue(abs(float(val_1) - float(val_2)) < 0.00001) | ||
else: | ||
self.assertEqual(str(val_1), str(val_2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? What type cannot be compared by a simple equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this needs to stay, please add a comment what type(s) you expect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed not really necessary. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! This will be a huge step forward!
What does this PR implement/fix? Explain your changes.
Extends the API to support regression runs
How should this PR be tested?
test_run_functions.py::TestRun::test_run_and_upload_linear_regression
Example run: https://test.openml.org/r/22694
Any other comments?
Currently, it can't output the exact correct values because they were converted to floats. Opened an issue to discuss what to do here: #559