-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is computation time not reported if n_jobs != 1 or != None? #895
Comments
ping @janvanrijn @mfeurer ;) |
Should we add a |
The reason it is wrong for |
I'll come back to you after the ICML deadline. |
Thanks for raising this issue, it seems that there are indeed one or two problems here. I believe the reason why the wallclock time is not reported if the number of cores is -1 is because we can't figure out on how many cores it was executed and the number then only makes limited sense. Currently, this is a very restrictive assumption that can be circumvented in plenty of ways (as you showed). Do you have any suggestions on how to improve on this?
That exists and is computed if In order to get the times of each base run you can check optimization trace which should have the time for each model fit. However, we currently don't seem to store the refit time correctly (or at all?), which to me currently seems like the biggest bug here. |
I think you can use |
Yet another issue we have to think about is the recent use of OpenMP in scikit-learn which might make it harder for us to get a useful estimate of the used time. |
Sorry that this has stalled for so long, but now it's finally time to pick this up and finish it! I think we basically have the following cases here which we need to consider:
and IIRC we can measure the following things:
That means we can do the following things for cases 1-4:
As @NicolasHug pointed out, one can override the behavior via a context manager. Another caveat is that when using a server-worker system such as dask one does not necessarily get all available CPUs or the jobs might just be in the queue, making the wallclock time of the overall run completely useless. Therefore, I propose to do the following:
What do you think about this @NicolasHug @amueller @PGijsbers |
I'd be careful not to spend too much time on this, as it will become a very complicated/impossible project on its own (we're going to have to account for different parallelization strategies/packages, but would also need to start capturing hardware information etc.). However making the proposed changes, and then clearly documenting under which conditions what is measured, and how to interpret this data, still seems like a worthwhile change to me. |
We followed the suggestion of @NicolasHug to just log the CPU and wallclock time and give the user the possibility and duty to interpret those. To simplify matters we added a lengthy example. |
I'm running a big benchmark suite with
RandomizedSearchCV(n_jobs=-1)
.Unfortunately, computation time is reported only if
n_jobs
isNone
or1
.I don't understand the reason #229. Why isn't the interpretation left out to the user?
As a side note:
n_jobs=None
can be overridden with a context manager:This is equivalent to just calling
RandomizedSearchCV(n_jobs=-1)
.With the latter, openml won't report computation time, but as far as I understand, the former will run just fine and report the computation time. So it seems that the check isn't properly enforced anyway.
CC @amueller
The text was updated successfully, but these errors were encountered: