-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding sklearn docstring to flow #756
Conversation
@mfeurer A couple of things though:
|
This is a frontend error on the test server and nothing to worry about here.
Do you mean on the test or live server? If yes, I don't think there's a reason to do so. In my opinion it's sufficient if this information is added in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this already looks good. I have a few comments which could make the code a bit more resilient. Could you please post a link to a "new" flow and also start adding unit tests?
@mfeurer there will be quite a few errors namely |
Hm, it appears that we need to ignore the descriptions for these unit tests. |
This function checks if the flow on the server and local match. The result of that would be that the no exceptions are raised while the server flow is intact, whereas the local flow will have the newer sklearn documentations. More edge cases:
Shall the check/assignment be a little looser wherein, if a parameter match is not found, we don't add the string description for it. Especially given that this works largely. These are a few edge cases that need to be tackled, but specific manual fixes may break in the future. We are relying heavily on sklearn docs. |
Yes, please, but use a flag like for the other exemptions. Regarding edge case number 1, I'm not sure why adding more characters to the regex breaks this. However, you could simply split the string into the part before and after the first colon and use these? Regarding edge case number 2, I think we should just add no description. And yes, the whole thing is under the assumption that 3rd-party packages use the same documentation as scikit-learn. That's a bit unfortunate. Could you please add a comment to the scikit-learn extension describing this? |
Codecov Report
@@ Coverage Diff @@
## develop #756 +/- ##
==========================================
+ Coverage 88.06% 88.26% +0.2%
==========================================
Files 36 36
Lines 4114 4278 +164
==========================================
+ Hits 3623 3776 +153
- Misses 491 502 +11
Continue to review full report at Codecov.
|
tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py
Outdated
Show resolved
Hide resolved
Most likely. You could do a version switch here and check the fixture for 0.21, and the output of the function (which you had before) for the others? |
@mfeurer it appears that we are testing with a really old sklearn version of 0.18.2. |
Reference Issue
Addresses #175.
What does this PR implement/fix? Explain your changes.
Adds the sklearn model description to the OpenML Flow description for a flow that uses a sklearn model. Also parses the docstring and adds the data types and parameter descriptions for the
parameters_meta_info
attribute of a flow.How should this PR be tested?
import openml
from sklearn.linear_model import SGDClassifier
openml.config.start_using_configuration_for_example()
task = openml.tasks.get_task(403)
clf = SGDClassifier()
r, f = openml.runs.run_model_on_task(clf, task, return_flow=True)
print(f.description)
print(f.parameters_meta_info['learning_rate']['data_type'])
print(f.parameters_meta_info['learning_rate']['description'])