Warning if fitted sklearn model being used #989

Neeratyoy · 2020-10-30T13:39:41Z

Reference Issue

Addresses #968.

What does this PR implement/fix? Explain your changes.

Prints a warning if the user passes an already fitted sklearn model to run_model_on_task.

How should this PR be tested?

import openml
from sklearn.ensemble import RandomForestClassifier


task = openml.tasks.get_task(23)
clf = RandomForestClassifier()

run = openml.runs.run_model_on_task(clf, task)
print("Unfit model run")

X, y = task.get_X_and_y()
clf.fit(X, y)
run = openml.runs.run_model_on_task(clf, task)
print("Fit model run")

openml/extensions/sklearn/extension.py

PGijsbers · 2020-10-30T18:44:56Z

openml/extensions/sklearn/extension.py

+            )
+        except NotFittedError:
+            # model is not fitted, as is required
+            pass


I thought in the Python call we discussed that perhaps we would check this at the first call to run_model_on_task?
In either case I would extract this to a separate method _raise_warning_if_fitted to make sure the functions don't get too big (they already are).

Alright, will make it into a function and push.

As for its placement, I reconsidered it given that irrespective of what is called, run_model_on_task or run_flow_on_task, this function is what the call is reduced to. Hence went ahead with this placement for this snippet of code.

run_model_on_task actually calls run_flow_on_task. That said, then we would need to add a function to the extension interface that will indicate if a model is already fit, otherwise we can't check it in a general way.

@mfeurer do you think this is something we should want? or do we just leave it to the extension devs to implement a warning if they see it fit?

I like the idea of having this function as callback that can be implemented by the extension deves. And yes, I expected this function to be called from the run_model_on_task function.

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

mfeurer

This PR looks good now, but you accidentally committed a .orig file. Could you please remove that again?

Neeratyoy · 2020-11-02T15:22:18Z

This PR looks good now, but you accidentally committed a .orig file. Could you please remove that again?

Done

mfeurer · 2020-11-02T16:15:44Z

Hey, it's a bit hard to see this but among all the failures it says:

self = <openml.extensions.sklearn.extension.SklearnExtension object at 0x0000009C4334DA58>
model = RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
                       max_depth=None, max...                     n_jobs=None, oob_score=False, random_state=42, verbose=0,
                       warm_start=False)
    def check_if_model_fitted(self, model: Any) -> bool:
        """Returns True/False denoting if the model has already been fitted/trained
    
        Parameters
        ----------
        model : Any
    
        Returns
        -------
        bool
        """
        try:
            # check if model is fitted
            from sklearn.exceptions import NotFittedError
            from sklearn.utils.validation import check_is_fitted
    
>           check_is_fitted(model)  # raises a NotFittedError if the model has not been trained
E           TypeError: check_is_fitted() missing 1 required positional argument: 'attributes'
openml\extensions\sklearn\extension.py:1556: TypeError

Could you please have a look?

joaquinvanschoren · 2020-11-02T16:28:37Z

@all-contributors please add @Neeratyoy for code

allcontributors · 2020-11-02T16:28:45Z

@joaquinvanschoren

I've put up a pull request to add @Neeratyoy! 🎉

PGijsbers

Looks good to me, if the failing unit test that @mfeurer pointed out is fixed.

PGijsbers · 2020-11-02T16:47:47Z

@all-contributors please add @Neeratyoy for code

(sorry for the spam)

allcontributors · 2020-11-02T16:47:55Z

@PGijsbers

I've put up a pull request to add @Neeratyoy! 🎉

Neeratyoy · 2020-11-02T17:53:54Z

Could you please have a look?

Given the error message, it might be that the older versions of sklearn had check_is_fitted behaving differently.

I updated the design of the check to be agnostic to the sklearn versions and also to the kind of model passed.

codecov-io · 2020-11-03T14:32:17Z

Codecov Report

Merging #989 into develop will decrease coverage by 0.05%.
The diff coverage is 71.42%.

@@             Coverage Diff             @@
##           develop     #989      +/-   ##
===========================================
- Coverage    87.91%   87.86%   -0.06%     
===========================================
  Files           36       36              
  Lines         4551     4565      +14     
===========================================
+ Hits          4001     4011      +10     
- Misses         550      554       +4

Impacted Files	Coverage Δ
openml/runs/functions.py	`83.16% <50.00%> (-0.17%)`	⬇️
openml/extensions/sklearn/extension.py	`90.84% <70.00%> (-0.24%)`	⬇️
openml/extensions/extension_interface.py	`91.66% <100.00%> (+0.49%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a629562...1e90133. Read the comment docs.

PGijsbers · 2020-11-03T15:43:19Z

As far as I can tell these failures are a combination of timeouts and server issues, am I overlooking something?

mfeurer · 2020-11-03T15:48:45Z

Yes, except for:

pytest workers crashing. This is due to the timeout of pytest being lower than the unit test running (timeout is set to 600s, if the unit test now waits for 600s, it runs over the timeout)
a known failure that previously only existed on Windows. I already asked @Neeratyoy to look into this.
In my opinion, this PR can be merged.

PGijsbers · 2020-11-03T16:07:14Z

pytest workers crashing.

Yes this is what I referred to with timeouts (not server timeouts), my bad for leaving it ambiguous.

In my opinion, this PR can be merged.

Will do.

Neeratyoy added 3 commits October 28, 2020 16:45

sklearn model fit check

d224845

Merge branch 'develop' into fix_968

409c6c0

Reordering warning

01a7c36

Neeratyoy requested review from mfeurer and PGijsbers and removed request for mfeurer October 30, 2020 13:40

PGijsbers reviewed Oct 30, 2020

View reviewed changes

mfeurer and others added 3 commits November 2, 2020 09:00

Update openml/extensions/sklearn/extension.py

d08ad83

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

Adding function to ext for checking model fit

c72baee

resolving merge conflicts

274506b

Neeratyoy requested review from mfeurer and PGijsbers November 2, 2020 14:11

mfeurer requested changes Nov 2, 2020

View reviewed changes

Removing junk file

acb8d66

mfeurer approved these changes Nov 2, 2020

View reviewed changes

allcontributors bot mentioned this pull request Nov 2, 2020

docs: add Neeratyoy as a contributor #993

Closed

PGijsbers reviewed Nov 2, 2020

View reviewed changes

allcontributors bot mentioned this pull request Nov 2, 2020

docs: add Neeratyoy as a contributor #998

Merged

Fixing sklearn version compatibility issue

3c3b3d8

Merge branch 'develop' into fix_968

1e90133

PGijsbers merged commit accde88 into develop Nov 3, 2020

PGijsbers deleted the fix_968 branch November 3, 2020 16:07

PGijsbers mentioned this pull request Nov 3, 2020

Raise warning if sklearn model already fitted prior to _run_model_on_task/fold() #968

Closed

github-actions bot pushed a commit that referenced this pull request Nov 3, 2020

Neeratyoy Mallik: Warning if fitted sklearn model being used (#989)

31461ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning if fitted sklearn model being used #989

Warning if fitted sklearn model being used #989

Neeratyoy commented Oct 30, 2020

PGijsbers Oct 30, 2020

Neeratyoy Oct 30, 2020

PGijsbers Oct 31, 2020

PGijsbers Oct 31, 2020

mfeurer Nov 2, 2020 •

edited

Loading

mfeurer left a comment

Neeratyoy commented Nov 2, 2020

mfeurer commented Nov 2, 2020

joaquinvanschoren commented Nov 2, 2020

allcontributors bot commented Nov 2, 2020

PGijsbers left a comment

PGijsbers commented Nov 2, 2020

allcontributors bot commented Nov 2, 2020

Neeratyoy commented Nov 2, 2020

codecov-io commented Nov 3, 2020 •

edited

Loading

PGijsbers commented Nov 3, 2020

mfeurer commented Nov 3, 2020

PGijsbers commented Nov 3, 2020

Warning if fitted sklearn model being used #989

Warning if fitted sklearn model being used #989

Conversation

Neeratyoy commented Oct 30, 2020

Reference Issue

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

PGijsbers Oct 30, 2020

Choose a reason for hiding this comment

Neeratyoy Oct 30, 2020

Choose a reason for hiding this comment

PGijsbers Oct 31, 2020

Choose a reason for hiding this comment

PGijsbers Oct 31, 2020

Choose a reason for hiding this comment

mfeurer Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

mfeurer left a comment

Choose a reason for hiding this comment

Neeratyoy commented Nov 2, 2020

mfeurer commented Nov 2, 2020

joaquinvanschoren commented Nov 2, 2020

allcontributors bot commented Nov 2, 2020

PGijsbers left a comment

Choose a reason for hiding this comment

PGijsbers commented Nov 2, 2020

allcontributors bot commented Nov 2, 2020

Neeratyoy commented Nov 2, 2020

codecov-io commented Nov 3, 2020 • edited Loading

Codecov Report

PGijsbers commented Nov 3, 2020

mfeurer commented Nov 3, 2020

PGijsbers commented Nov 3, 2020

mfeurer Nov 2, 2020 •

edited

Loading

codecov-io commented Nov 3, 2020 •

edited

Loading