Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression #560

Merged
merged 56 commits into from
Feb 19, 2019
Merged

Regression #560

merged 56 commits into from
Feb 19, 2019

Conversation

joaquinvanschoren
Copy link
Contributor

What does this PR implement/fix? Explain your changes.

Extends the API to support regression runs

How should this PR be tested?

test_run_functions.py::TestRun::test_run_and_upload_linear_regression

Example run: https://test.openml.org/r/22694

Any other comments?

Currently, it can't output the exact correct values because they were converted to floats. Opened an issue to discuss what to do here: #559

@joaquinvanschoren joaquinvanschoren changed the base branch from master to develop September 27, 2018 23:58
@codecov-io
Copy link

codecov-io commented Sep 30, 2018

Codecov Report

Merging #560 into develop will decrease coverage by 0.16%.
The diff coverage is 81.9%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #560      +/-   ##
===========================================
- Coverage    89.76%   89.59%   -0.17%     
===========================================
  Files           32       32              
  Lines         3077     3133      +56     
===========================================
+ Hits          2762     2807      +45     
- Misses         315      326      +11
Impacted Files Coverage Δ
openml/tasks/__init__.py 100% <ø> (ø) ⬆️
openml/runs/functions.py 86.43% <75%> (-0.9%) ⬇️
openml/runs/run.py 88.97% <83.63%> (-0.55%) ⬇️
openml/tasks/functions.py 88.07% <90.9%> (ø) ⬆️
openml/tasks/task.py 95.87% <95.34%> (-0.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cefd097...e23233e. Read the comment docs.

@mfeurer
Copy link
Collaborator

mfeurer commented Dec 21, 2018

Hey @joaquinvanschoren any news on this? Do you think you can finish this in the first weeks of the new year?

@joaquinvanschoren
Copy link
Contributor Author

Update: I ran into a very strange error when I run the tests. There is a global variable 'cachedirectory' in config.py, and for some reason, in the regression branch it is set to the directory I run the tests from on the commandline instead of the default cache. It does not happen in the develop branch. Does that ring any bells?

Otherwise, I'll just keep on looking. I should have some time next week to finish this.

@mfeurer
Copy link
Collaborator

mfeurer commented Dec 21, 2018

No, this never happened to me so far. You could check the file testing.py in the openml directory. It performs some directory operations prior to starting the actual test, and I expect that this is the most likely culprit.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks a lot! This looks great! I think my comments are rather minor and we should be able to merge this rather soonish :)


if ProbaY.shape[1] != len(task.class_labels):
warnings.warn("Repeat %d Fold %d: estimator only predicted for %d/%d classes!" % (rep_no, fold_no, ProbaY.shape[1], len(task.class_labels)))
# TODO: Is it OK to move predict_proba outside of the runtime measurement?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me.

arff_dict['attributes'] = [('repeat', 'NUMERIC'),
('fold', 'NUMERIC'),
('row_id', 'NUMERIC'),
('cluster', 'NUMERIC')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An else: raise NotImplementedError would be great here.

return arff_dict

def get_metric_fn(self, sklearn_fn, kwargs={}):
"""Calculates metric scores based on predicted values. Assumes the
"""Calculates metric scores based on prnedicted values. Assumes the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid that you introduced a typo here ;)

raise ValueError('Attribute "correct" should be set')
if 'prediction' not in attribute_names:
raise ValueError('Attribute "predict" should be set')
if task.task_type_id == TaskTypeEnum.SUPERVISED_CLASSIFICATION and \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also holds for learning curves, right?

# should take at least one millisecond (?)
'usercpu_time_millis': (0, max_time_allowed)}

print(task_type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No print statements please.


print(task_type)

if task_type == "Supervised Classification" or \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the task type enum here as well?

task_type="Supervised Classification")
pass

def _run_and_upload_regression(self, clf, rsv):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if you could rename that other function to _run_and_upload_classification.

pass

def _run_and_upload_regression(self, clf, rsv):
def determine_grid_size(param_grid):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this function be generalized as it is also used for classification?

run = self._perform_run(task_id, num_test_instances, clf,
random_state_value=rsv)

# obtain accuracy scores using get_metric_score:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename all references to classification (such as accuracy) to reference regression?

run.fold_evaluations['mean_absolute_error'][rep][fold])
self.assertEqual(sum(mae_scores_provided), sum(mae_scores))

if isinstance(clf, BaseSearchCV):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be generalized, too. After thinking about this, I think the whole function should be generalized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done everything. Testing now.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks (almost) good to me. Could you please check my final comments?

I'd merge once they are resolved despite the failing unit tests since all changes since they were running were cosmetic.

if type(val_1) == type(val_2):
self.assertEqual(val_1, val_2)
elif type(val_1) == float or type(val_2) == float:
self.assertTrue(abs(float(val_1) - float(val_2)) < 0.00001)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace this by self.assertAlmostEqual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. done.

elif type(val_1) == float or type(val_2) == float:
self.assertTrue(abs(float(val_1) - float(val_2)) < 0.00001)
else:
self.assertEqual(str(val_1), str(val_2))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? What type cannot be compared by a simple equal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this needs to stay, please add a comment what type(s) you expect here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed not really necessary. Removed.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! This will be a huge step forward!

@mfeurer mfeurer merged commit 3ed08f0 into develop Feb 19, 2019
@mfeurer mfeurer deleted the regression branch February 19, 2019 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants