Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding sklearn docstring to flow #756

Merged
merged 15 commits into from
Sep 13, 2019
Merged

Adding sklearn docstring to flow #756

merged 15 commits into from
Sep 13, 2019

Conversation

Neeratyoy
Copy link
Contributor

Reference Issue

Addresses #175.

What does this PR implement/fix? Explain your changes.

Adds the sklearn model description to the OpenML Flow description for a flow that uses a sklearn model. Also parses the docstring and adds the data types and parameter descriptions for the parameters_meta_info attribute of a flow.

How should this PR be tested?

import openml
from sklearn.linear_model import SGDClassifier
openml.config.start_using_configuration_for_example()
task = openml.tasks.get_task(403)
clf = SGDClassifier()
r, f = openml.runs.run_model_on_task(clf, task, return_flow=True)
print(f.description)
print(f.parameters_meta_info['learning_rate']['data_type'])
print(f.parameters_meta_info['learning_rate']['description'])

@Neeratyoy
Copy link
Contributor Author

@mfeurer
Hope this is in line with the requirement of the issue.

A couple of things though:

  • In the example given above, when I go to the test server page for the corresponding flow_id generated after run_model_on_task, the descriptions loads and then throws a 'Not Found` error. However, the parameter descriptions come up fine.

  • Currently, I am updating the description for all sklearn flows. However, I am wondering where will this break given most of the popular sklearn models are existing as flows. I have noticed that if I set flow_id=None and publish() a flow which exists, it gives a ValueError for the descriptions and parameter descriptions too.

@mfeurer
Copy link
Collaborator

mfeurer commented Aug 7, 2019

the descriptions loads and then throws a 'Not Found` error

This is a frontend error on the test server and nothing to worry about here.

Currently, I am updating the description for all sklearn flows

Do you mean on the test or live server? If yes, I don't think there's a reason to do so. In my opinion it's sufficient if this information is added in the future.

Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this already looks good. I have a few comments which could make the code a bit more resilient. Could you please post a link to a "new" flow and also start adding unit tests?

openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
openml/extensions/sklearn/extension.py Show resolved Hide resolved
openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved
@Neeratyoy Neeratyoy requested a review from mfeurer August 7, 2019 14:58
@Neeratyoy
Copy link
Contributor Author

@mfeurer there will be quite a few errors namely AttributeError, KeyError which I need to take a look at. The problem I discussed with you should be raising ValueErrors specifically.

@mfeurer
Copy link
Collaborator

mfeurer commented Aug 7, 2019

Hm, it appears that we need to ignore the descriptions for these unit tests.

@Neeratyoy
Copy link
Contributor Author

Neeratyoy commented Aug 7, 2019

Hm, it appears that we need to ignore the descriptions for these unit tests.

This function checks if the flow on the server and local match.
Shall I edit this to exempt the check for description and parameters_meta_info?

The result of that would be that the no exceptions are raised while the server flow is intact, whereas the local flow will have the newer sklearn documentations.
For new flows, it shouldn't be a problem either way.

More edge cases:

  1. Please see the algorithm parameter description that lacks a str data type. If I make more characters a part of my regex search, then it breaks. This is an edge case.

  2. Another one is for the test case here where the estimator used is another object and allows for its own set of parameters, which are not there in the docstring of the main sklearn object.

Shall the check/assignment be a little looser wherein, if a parameter match is not found, we don't add the string description for it. Especially given that this works largely. These are a few edge cases that need to be tackled, but specific manual fixes may break in the future. We are relying heavily on sklearn docs.
I could explicitly call model.get_params() and fill in the exact missing params as the keys with description and data_type as None.
Kindly advise.

@mfeurer
Copy link
Collaborator

mfeurer commented Aug 9, 2019

Shall I edit this to exempt the check for description and parameters_meta_info?

Yes, please, but use a flag like for the other exemptions.

Regarding edge case number 1, I'm not sure why adding more characters to the regex breaks this. However, you could simply split the string into the part before and after the first colon and use these?

Regarding edge case number 2, I think we should just add no description. And yes, the whole thing is under the assumption that 3rd-party packages use the same documentation as scikit-learn. That's a bit unfortunate. Could you please add a comment to the scikit-learn extension describing this?

@codecov-io
Copy link

codecov-io commented Aug 25, 2019

Codecov Report

Merging #756 into develop will increase coverage by 0.2%.
The diff coverage is 90.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop     #756     +/-   ##
==========================================
+ Coverage    88.06%   88.26%   +0.2%     
==========================================
  Files           36       36             
  Lines         4114     4278    +164     
==========================================
+ Hits          3623     3776    +153     
- Misses         491      502     +11
Impacted Files Coverage Δ
openml/flows/functions.py 88.07% <88.88%> (+0.1%) ⬆️
openml/extensions/sklearn/extension.py 91.24% <90.47%> (-0.11%) ⬇️
openml/utils.py 91.13% <0%> (-0.64%) ⬇️
openml/runs/functions.py 82.07% <0%> (-0.29%) ⬇️
openml/evaluations/functions.py 92.23% <0%> (-0.08%) ⬇️
openml/testing.py 95.92% <0%> (+0.63%) ⬆️
openml/_api_calls.py 88.31% <0%> (+3.89%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 053623d...7d685e1. Read the comment docs.

@Neeratyoy Neeratyoy marked this pull request as ready for review August 26, 2019 18:45
openml/flows/functions.py Outdated Show resolved Hide resolved
openml/flows/functions.py Outdated Show resolved Hide resolved
@Neeratyoy
Copy link
Contributor Author

Neeratyoy commented Sep 2, 2019

@mfeurer
The tests that are failing, I am not able to reproduce them locally. The descriptions match consistently.
The older version of sklearn docs 0.19 and 0.20 differ slightly from the current stable release.
Could this be a version issue?

@mfeurer
Copy link
Collaborator

mfeurer commented Sep 4, 2019

Could this be a version issue?

Most likely. You could do a version switch here and check the fixture for 0.21, and the output of the function (which you had before) for the others?

@Neeratyoy
Copy link
Contributor Author

@mfeurer it appears that we are testing with a really old sklearn version of 0.18.2.
Why don't we test with the latest ones, since that is probably what the users would be using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants