Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is computation time not reported if n_jobs != 1 or != None? #895

Closed
NicolasHug opened this issue Jan 17, 2020 · 10 comments
Closed

Why is computation time not reported if n_jobs != 1 or != None? #895

NicolasHug opened this issue Jan 17, 2020 · 10 comments

Comments

@NicolasHug
Copy link

I'm running a big benchmark suite with RandomizedSearchCV(n_jobs=-1).

Unfortunately, computation time is reported only if n_jobs is None or 1.

I don't understand the reason #229. Why isn't the interpretation left out to the user?


As a side note: n_jobs=None can be overridden with a context manager:

from joblib import parallel_backend
with parallel_backend('loky', n_jobs=-1):
	RandomizedSearchCV(n_jobs=None)

This is equivalent to just calling RandomizedSearchCV(n_jobs=-1).

With the latter, openml won't report computation time, but as far as I understand, the former will run just fine and report the computation time. So it seems that the check isn't properly enforced anyway.

CC @amueller

@amueller
Copy link
Contributor

amueller commented Feb 3, 2020

ping @janvanrijn @mfeurer ;)

@amueller
Copy link
Contributor

amueller commented Feb 3, 2020

Should we add a wallclock_time_millis_training additionally maybe which can always be computed?

@amueller
Copy link
Contributor

amueller commented Feb 3, 2020

The reason it is wrong for n_jobs != 1 is that internally it uses process_time which will not count any of the subprocess time, and it's not using wall-clock time.

@mfeurer
Copy link
Collaborator

mfeurer commented Feb 3, 2020

ping @janvanrijn @mfeurer ;)

I'll come back to you after the ICML deadline.

@mfeurer
Copy link
Collaborator

mfeurer commented Feb 10, 2020

Thanks for raising this issue, it seems that there are indeed one or two problems here.

I believe the reason why the wallclock time is not reported if the number of cores is -1 is because we can't figure out on how many cores it was executed and the number then only makes limited sense. Currently, this is a very restrictive assumption that can be circumvented in plenty of ways (as you showed). Do you have any suggestions on how to improve on this?

Should we add a wallclock_time_millis_training additionally maybe which can always be computed?

That exists and is computed if n_jobs != -1.

In order to get the times of each base run you can check optimization trace which should have the time for each model fit. However, we currently don't seem to store the refit time correctly (or at all?), which to me currently seems like the biggest bug here.

@NicolasHug
Copy link
Author

Do you have any suggestions on how to improve on this?

I think you can use effective_n_jobs from joblib: https://github.com/joblib/joblib/blob/master/joblib/parallel.py#L366

@mfeurer
Copy link
Collaborator

mfeurer commented Jul 31, 2020

Yet another issue we have to think about is the recent use of OpenMP in scikit-learn which might make it harder for us to get a useful estimate of the used time.

@mfeurer
Copy link
Collaborator

mfeurer commented Feb 1, 2021

Sorry that this has stalled for so long, but now it's finally time to pick this up and finish it!

I think we basically have the following cases here which we need to consider:

  1. estimators that don't involve any parallelism, for example simple decision trees
  2. estimators that do parallelization inside themselves via BLAS or OpenMP, for example SGD or HistGradientBoosting
  3. estimators that do parallelization via joblib, for example RandomForest
  4. HPO algorithms that call an underlying algorithm multiple times via joblib

and IIRC we can measure the following things:

  1. CPU time for the whole run
  2. Wallclock time for the whole run
  3. CPU time for each individual run in HPO
  4. Wallclock time for each individual run in HPO

That means we can do the following things for cases 1-4:

  1. Easy, we can measure both CPU time and wallclock time
  2. Tricky. We can measure both CPU time and wallclock time, but for wallclock time we won't know how many CPUs were involved. In case OpenML is started on several processes or machines (no idea if this is realistic) we also don't get reliable estimates of the process usage any more.
  3. Hard. We can easily measure CPU and wallclock time as long as n_jobs=1. We can still measure wallclock time as long as n_jobs>=1 as we'd know how many cores are used. In case of n_jobs==-1 we won't know how many cores are being used, but we could use effective_n_jobs to get an estimate. CPU time is never measurable as we don't have access to the CPU time of the individual jobs.
  4. Easier again. As each individual job measures the time itself we can in the end gather all individual times and add them up to obtain the total time taken.

As @NicolasHug pointed out, one can override the behavior via a context manager. Another caveat is that when using a server-worker system such as dask one does not necessarily get all available CPUs or the jobs might just be in the queue, making the wallclock time of the overall run completely useless.

Therefore, I propose to do the following:

  1. Document what we're doing
  2. Implement not being cheated by the context manager
  3. Implement storing the refit time for HPO as asked for in Store runtime of instances of BaseSearchCV #248
  4. Figure out what to do with dask - can we somehow store which backend was used?

What do you think about this @NicolasHug @amueller @PGijsbers

@PGijsbers
Copy link
Collaborator

I'd be careful not to spend too much time on this, as it will become a very complicated/impossible project on its own (we're going to have to account for different parallelization strategies/packages, but would also need to start capturing hardware information etc.). However making the proposed changes, and then clearly documenting under which conditions what is measured, and how to interpret this data, still seems like a worthwhile change to me.

@mfeurer
Copy link
Collaborator

mfeurer commented Apr 13, 2021

We followed the suggestion of @NicolasHug to just log the CPU and wallclock time and give the user the possibility and duty to interpret those. To simplify matters we added a lengthy example.

@mfeurer mfeurer closed this as completed Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants