Let's add a `suggested_num_workers()` method? #2196

williamFalcon · 2020-06-15T19:01:04Z

V1 could be:

import subprocess

def suggest_num_workers(num_accelerators):
    num_cpus = multiprocessing.cpu_count()
    return num_cpus * num_accelerators

@PyTorchLightning/core-contributors
Any other heuristics you guys use?

justusschock · 2020-06-16T11:42:54Z

@williamFalcon I tend to do some stress testing. E.g. I try to repeatedly load samples from my dataset with increasing number of workers and monitor disk I/O. Especially while loading large files (e.g. medical DICOMs) the cpu is not the bottleneck but disk io is

williamFalcon · 2020-06-16T12:44:18Z

but there’s some upper bound on num_workers based on the num_cpus no?

maybe we can do something like the learning rate finder but for num_workers?
@SkafteNicki

justusschock · 2020-06-16T12:57:40Z

Technically there isn't. If you got to many workers/threads they will be scheduled by your OS, but there is no such thing as a limit. Even if you have more workers, it may sometimes be easier to just load another process context and have inter-process communication in the background, since this sometimes takes a while.

I'm also not sure, if cpu count gives you logical or physical cores

SkafteNicki · 2020-06-16T13:20:41Z

@williamFalcon I had the same idea when saw your PR yesterday.
We could iteratively try to increase the number of workers and log the training time over a few batches. I guess that when we reach a certain amount of workers, the training time will plateau (no decrease any further) and we could suggest n_workers just before this happens.

williamFalcon · 2020-06-16T14:21:21Z

exactly. that would be ideal.

More of a reason to get this Tuner object separated @tullie

tullie · 2020-06-16T14:27:50Z

Yeah that’d be awesome. I often do a small benchmark on this when I create a new data loader.

SkafteNicki · 2020-06-16T15:07:42Z

So I guess first step is to get the v1 of tuner merge to master (PR: #1998).
Then we can extend with this functionality (I will be happy to do it but if you have some code @tullie to get me started it would be more than welcome)

tullie · 2020-06-16T19:31:58Z

Yep that sounds good. BTW a related PR to the Tuner is this one I sent out last week #2107. It shows another potential way of decomposing the trainer and having shared arguments.

edenlightning · 2021-02-22T23:34:05Z

@SkafteNicki still relevant?

SkafteNicki · 2021-02-24T10:16:00Z

Lets keep it alive, I already have a partial implementation ready

williamFalcon · 2021-02-24T12:18:36Z

we already show a warning no? is the implementation different than that?

GeorgePearse · 2022-10-04T19:05:22Z

I'd be keen to try to improve this a bit if no one's got code they're happy to contribute already.

justusschock · 2022-10-04T21:26:39Z

@GeorgePearse feel free to give it a try. This obviously hasn't been a priority for us.

Just remember that you want to have one cpu-core/thread free for the main process so this doesn't get scheduled too bad.

Ideally you'd also consider RAM as it usually scales linearly with number of workers (each gets a copy of the dataset and loads stuff at the same time). However, this is something that can also come later on.

Also make sure to open a draft PR as soon aa you have something to discuss to get feedback as early as possible.

williamFalcon added the feature Is an improvement or enhancement label Jun 15, 2020

williamFalcon added this to the 0.9.0 milestone Jun 15, 2020

Borda added the good first issue Good for newcomers label Aug 4, 2020

edenlightning modified the milestones: 0.9.0, 0.9.x Aug 18, 2020

SkafteNicki mentioned this issue Aug 25, 2020

[WIP] Hypertuner class #3160

Closed

7 tasks

edenlightning modified the milestones: 0.9.x, 1.1 Sep 22, 2020

edenlightning added design Includes a design discussion v1.0 post labels Sep 22, 2020

Borda removed the v1.0 post label Oct 13, 2020

edenlightning modified the milestones: 1.1, 1.0.3, 1.2 Oct 19, 2020

edenlightning modified the milestones: 1.2, 1.3 Feb 8, 2021

edenlightning removed this from the 1.3 milestone Feb 22, 2021

edenlightning removed the Hacktoberfest label Feb 22, 2021

SkafteNicki self-assigned this Feb 24, 2021

carmocca mentioned this issue Sep 30, 2022

Auto num_worker tuning #14961

Closed

awaelchli mentioned this issue Sep 19, 2023

Improve the suggested num_workers warning #18591

Merged

awaelchli closed this as completed in #18591 Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Let's add a `suggested_num_workers()` method? #2196

Let's add a `suggested_num_workers()` method? #2196

williamFalcon commented Jun 15, 2020

justusschock commented Jun 16, 2020

williamFalcon commented Jun 16, 2020

justusschock commented Jun 16, 2020

SkafteNicki commented Jun 16, 2020

williamFalcon commented Jun 16, 2020

tullie commented Jun 16, 2020

SkafteNicki commented Jun 16, 2020

tullie commented Jun 16, 2020

edenlightning commented Feb 22, 2021

SkafteNicki commented Feb 24, 2021

williamFalcon commented Feb 24, 2021

GeorgePearse commented Oct 4, 2022

justusschock commented Oct 4, 2022

Let's add a suggested_num_workers() method? #2196

Let's add a suggested_num_workers() method? #2196

Comments

williamFalcon commented Jun 15, 2020

justusschock commented Jun 16, 2020

williamFalcon commented Jun 16, 2020

justusschock commented Jun 16, 2020

SkafteNicki commented Jun 16, 2020

williamFalcon commented Jun 16, 2020

tullie commented Jun 16, 2020

SkafteNicki commented Jun 16, 2020

tullie commented Jun 16, 2020

edenlightning commented Feb 22, 2021

SkafteNicki commented Feb 24, 2021

williamFalcon commented Feb 24, 2021

GeorgePearse commented Oct 4, 2022

justusschock commented Oct 4, 2022

Let's add a `suggested_num_workers()` method? #2196

Let's add a `suggested_num_workers()` method? #2196