Regression #560

joaquinvanschoren · 2018-09-27T23:57:37Z

What does this PR implement/fix? Explain your changes.

Extends the API to support regression runs

How should this PR be tested?

test_run_functions.py::TestRun::test_run_and_upload_linear_regression

Example run: https://test.openml.org/r/22694

Any other comments?

Currently, it can't output the exact correct values because they were converted to floats. Opened an issue to discuss what to do here: #559

…nto moretasks

codecov-io · 2018-09-30T23:10:14Z

Codecov Report

Merging #560 into develop will decrease coverage by 0.16%.
The diff coverage is 81.9%.

@@             Coverage Diff             @@
##           develop     #560      +/-   ##
===========================================
- Coverage    89.76%   89.59%   -0.17%     
===========================================
  Files           32       32              
  Lines         3077     3133      +56     
===========================================
+ Hits          2762     2807      +45     
- Misses         315      326      +11

Impacted Files	Coverage Δ
openml/tasks/__init__.py	`100% <ø> (ø)`	⬆️
openml/runs/functions.py	`86.43% <75%> (-0.9%)`	⬇️
openml/runs/run.py	`88.97% <83.63%> (-0.55%)`	⬇️
openml/tasks/functions.py	`88.07% <90.9%> (ø)`	⬆️
openml/tasks/task.py	`95.87% <95.34%> (-0.43%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cefd097...e23233e. Read the comment docs.

…into regression

mfeurer · 2018-12-21T09:23:39Z

Hey @joaquinvanschoren any news on this? Do you think you can finish this in the first weeks of the new year?

joaquinvanschoren · 2018-12-21T10:29:56Z

Update: I ran into a very strange error when I run the tests. There is a global variable 'cachedirectory' in config.py, and for some reason, in the regression branch it is set to the directory I run the tests from on the commandline instead of the default cache. It does not happen in the develop branch. Does that ring any bells?

Otherwise, I'll just keep on looking. I should have some time next week to finish this.

mfeurer · 2018-12-21T13:57:07Z

No, this never happened to me so far. You could check the file testing.py in the openml directory. It performs some directory operations prior to starting the actual test, and I expect that this is the most likely culprit.

…o develop

mfeurer

Hey, thanks a lot! This looks great! I think my comments are rather minor and we should be able to merge this rather soonish :)

mfeurer · 2019-02-13T08:20:50Z

openml/runs/functions.py


-    if ProbaY.shape[1] != len(task.class_labels):
-        warnings.warn("Repeat %d Fold %d: estimator only predicted for %d/%d classes!" % (rep_no, fold_no, ProbaY.shape[1], len(task.class_labels)))
+    # TODO: Is it OK to move predict_proba outside of the runtime measurement?


Fine by me.

mfeurer · 2019-02-13T08:30:11Z

openml/runs/run.py

+            arff_dict['attributes'] = [('repeat', 'NUMERIC'),
+                                       ('fold', 'NUMERIC'),
+                                       ('row_id', 'NUMERIC'),
+                                       ('cluster', 'NUMERIC')]


An else: raise NotImplementedError would be great here.

mfeurer · 2019-02-13T08:30:38Z

openml/runs/run.py

        return arff_dict

    def get_metric_fn(self, sklearn_fn, kwargs={}):
-        """Calculates metric scores based on predicted values. Assumes the
+        """Calculates metric scores based on prnedicted values. Assumes the


I'm afraid that you introduced a typo here ;)

mfeurer · 2019-02-13T08:31:30Z

openml/runs/run.py

-            raise ValueError('Attribute "correct" should be set')
-        if 'prediction' not in attribute_names:
-            raise ValueError('Attribute "predict" should be set')
+        if task.task_type_id == TaskTypeEnum.SUPERVISED_CLASSIFICATION and \


This also holds for learning curves, right?

mfeurer · 2019-02-13T08:40:22Z

tests/test_runs/test_run_functions.py

+            # should take at least one millisecond (?)
+            'usercpu_time_millis': (0, max_time_allowed)}
+
+        print(task_type)


No print statements please.

mfeurer · 2019-02-13T08:40:56Z

tests/test_runs/test_run_functions.py

+
+        print(task_type)
+
+        if task_type == "Supervised Classification" or \


Could you please use the task type enum here as well?

mfeurer · 2019-02-13T08:52:55Z

tests/test_runs/test_run_functions.py

+                                     task_type="Supervised Classification")
+        pass
+
+    def _run_and_upload_regression(self, clf, rsv):


I think it would be great if you could rename that other function to _run_and_upload_classification.

mfeurer · 2019-02-13T08:53:58Z

tests/test_runs/test_run_functions.py

+        pass
+
+    def _run_and_upload_regression(self, clf, rsv):
+        def determine_grid_size(param_grid):


Could this function be generalized as it is also used for classification?

mfeurer · 2019-02-13T08:56:53Z

tests/test_runs/test_run_functions.py

+        run = self._perform_run(task_id, num_test_instances, clf,
+                                random_state_value=rsv)
+
+        # obtain accuracy scores using get_metric_score:


Could you please rename all references to classification (such as accuracy) to reference regression?

mfeurer · 2019-02-13T08:58:06Z

tests/test_runs/test_run_functions.py

+                    run.fold_evaluations['mean_absolute_error'][rep][fold])
+        self.assertEqual(sum(mae_scores_provided), sum(mae_scores))
+
+        if isinstance(clf, BaseSearchCV):


This could be generalized, too. After thinking about this, I think the whole function should be generalized.

Done everything. Testing now.

mfeurer

Looks (almost) good to me. Could you please check my final comments?

I'd merge once they are resolved despite the failing unit tests since all changes since they were running were cosmetic.

mfeurer · 2019-02-18T08:24:59Z

tests/test_runs/test_run_functions.py

+                if type(val_1) == type(val_2):
+                    self.assertEqual(val_1, val_2)
+                elif type(val_1) == float or type(val_2) == float:
+                    self.assertTrue(abs(float(val_1) - float(val_2)) < 0.00001)


Please replace this by self.assertAlmostEqual.

Cool. done.

mfeurer · 2019-02-18T08:25:34Z

tests/test_runs/test_run_functions.py

+                elif type(val_1) == float or type(val_2) == float:
+                    self.assertTrue(abs(float(val_1) - float(val_2)) < 0.00001)
+                else:
+                    self.assertEqual(str(val_1), str(val_2))


Why this? What type cannot be compared by a simple equal?

If this needs to stay, please add a comment what type(s) you expect here.

This is indeed not really necessary. Removed.

mfeurer

Thanks a lot! This will be a huge step forward!

joaquinvanschoren and others added 15 commits September 20, 2018 12:18

more tasks

cfe45f5

cleanup and fixes

1a6b6ce

tasks fixes

c7bb4c8

merge

078a8b4

added missing return

e3713de

Merge branch 'moretasks' of https://github.com/openml/openml-python i…

668d465

…nto moretasks

added learning curve task

0b0e0f1

fixed import

26ab965

added 2.7 compatibility

c3021a7

typos

2f05400

first implementation of regression and clustering

1fbf8bf

added test function

d87d992

cleaning and bugfixing

cbd6cdf

cleaning and bugfixing

ac4073a

unit test implemented plus many extensions for regression

067f700

joaquinvanschoren changed the base branch from master to develop September 27, 2018 23:58

Merge branch 'develop' into regression

e3d742d

joaquinvanschoren and others added 12 commits October 1, 2018 11:27

trying to fix travis build issues

5edf437

PEP8 fixes

e63cb17

Merge branch 'regression' of https://github.com/openml/openml-python …

d017a12

…into regression

more PEP8 fixes

a5dfb57

more PEP8 fixes

ed43417

more PEP8 fixes

86fc58c

more PEP8 fixes

82b688e

Merge branch 'develop' into regression

b66f476

fix merge issue

57d63e3

fix merge issue

79a2f91

last PEP8 fix

e498686

very last PEP8 fix

8ab30e5

joaquinvanschoren added 4 commits February 12, 2019 15:20

Merge branch 'develop' of https://github.com/openml/openml-python int…

373a948

…o develop

Merge branch 'develop' into regression

bdfad53

merge with develop + fixes

630367a

code cleanup and PEP8 fixes

8d74f95

mfeurer reviewed Feb 13, 2019

View reviewed changes

joaquinvanschoren and others added 13 commits February 14, 2019 18:08

please flake

521d49b

please matthias

07ac31c

bugfix

0ea6df5

Merge branch 'develop' into regression

2a001de

fix merge issues

729660a

fix merge issues

68c1530

please flake again

4783fd1

PEP8

b7961d6

PEP8

93d5549

PEP8

1c38a0d

PEP8

61f319d

More PEP8

e61a64e

More PEP8

dffe005

mfeurer reviewed Feb 18, 2019

View reviewed changes

joaquinvanschoren and others added 4 commits February 18, 2019 22:08

simplify unit test

a322217

PEP8

84e82a9

PEP8

3bb8801

Undo syntax error

e23233e

mfeurer approved these changes Feb 19, 2019

View reviewed changes

mfeurer merged commit 3ed08f0 into develop Feb 19, 2019

mfeurer deleted the regression branch February 19, 2019 08:55

mfeurer mentioned this pull request Feb 25, 2019

Running regression flow requires clases? #312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression #560

Regression #560

joaquinvanschoren commented Sep 27, 2018

codecov-io commented Sep 30, 2018 •

edited

Loading

mfeurer commented Dec 21, 2018

joaquinvanschoren commented Dec 21, 2018

mfeurer commented Dec 21, 2018

mfeurer left a comment

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

mfeurer Feb 13, 2019

joaquinvanschoren Feb 14, 2019

mfeurer left a comment

mfeurer Feb 18, 2019

joaquinvanschoren Feb 18, 2019

mfeurer Feb 18, 2019

mfeurer Feb 18, 2019

joaquinvanschoren Feb 18, 2019

mfeurer left a comment


		print(task_type)

		if task_type == "Supervised Classification" or \

Regression #560

Regression #560

Conversation

joaquinvanschoren commented Sep 27, 2018

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

codecov-io commented Sep 30, 2018 • edited Loading

Codecov Report

mfeurer commented Dec 21, 2018

joaquinvanschoren commented Dec 21, 2018

mfeurer commented Dec 21, 2018

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfeurer left a comment

Choose a reason for hiding this comment

codecov-io commented Sep 30, 2018 •

edited

Loading