-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize use of n_jobs and reporting of computation time #1038
Conversation
tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I renamed the file so I could render it on my local machine. Also, I left yet another few comments :)
Hey, I think the example is really getting along well. Based on the current status I'm wondering three things:
|
Do you mean their interaction and behaviour? It would be useful I guess, but the only concern or question is what (exactly) to summarize I guess.
Though we could, from the OpenML API standpoint, we can also ignore it. If the duty of parallelization is delegated to scikit-learn or OpenML (as we show in the example), I feel it is likely that the user may not set the backend with a On second thought, this might necessitate the mention of how the backends change in the entire stack of function calls going through OpenML to scikit-learn pipelines. This is obviously quite complicated.
In the latest SGDClassifier documentation, the parallelization happens through |
I think the main caveats to pay attention to, like the gist of #895
Probably I mixed up something. But an example on how this can be changed might be a good idea!
Yes, but that's for fitting multiple classifiers for a "one vs all" scheme. I was more thinking about something like the neural network where the number of used cores can't be set via the API. |
################################################################################ | ||
# Summmary | ||
# ********* | ||
# OpenML records model runtimes for the CPU-clock and the wall-clock times. The above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should we explicitly say "the scikit-learn extension"? Everything we're writing here is exclusive about the scikit-learn extension, so it could be confusing otherwise.
I still can't annotate all my comments... anyway, linear SVM also releases the GIL: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/svm/_liblinear.pyx#L61 Maybe naive bayes doesn't release the GIL? |
* Minor reshuffling * Update examples/30_extended/fetch_runtimes_tutorial.py Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com> Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com>
) * Unit test to test existence of refit time * Measuring runtime always * Removing redundant check in unit test * Updating docs with runtimes * Adding more utilities to new example * Removing refit_time + fetching trace runtime in example * rename example * Reiterating with changes to example from @mfeurer suggestions * Including refit time and other minor formatting * Adding more cases + a concluding summary * Cosmetic changes * Adding 5th case with no release of GIL * Removing debug code * Runtime measurement example updates (openml#1052) * Minor reshuffling * Update examples/30_extended/fetch_runtimes_tutorial.py Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com> Co-authored-by: Neeratyoy Mallik <neeratyoy@gmail.com> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
Reference Issue
Addresses #895.