No multiprocessing/low CPU utilization on CPU and/or Windows #9271

nicklasb · 2025-01-10T00:31:01Z

问题确认 Search before asking

我已经搜索过问题，但是没有找到解答。I have searched the question and found no related answer.

请提出你的问题 Please ask your question

Hi,

I am training a small PicoDet model, and I am doing it using the CPU on a Windows box.
The odd thing is that whatever I do, I never get more than 5-6 percent CPU load from the training Python process, and also, it is only one process, even though I have set worker_num to all kinds of values. I have also set use_multiprocess: True and so on.

I tried to run it in a debian docker, and while I have some issues there, that didn't either seem to start multiple processes.

Is there something I am missing? What settings should be enabled?

Bobholamovic · 2025-01-10T15:13:29Z

I would say that such inefficiency is anticipated when using CPUs for deep learning training. Here are the downsides of using CPUs to train deep learning models. The training process comprises several distinct steps, such as data loading, data preprocessing, model forward pass, loss computation, model backward pass, and so on. To accelerate data loading and data preprocessing, you can increase worker_num to utilize more worker processes; however, this will not improve the performance of the subsequent steps. This is because, regardless of how many workers are used for data loading and preprocessing, only one process will be used for the other steps. Additionally, it is commonly known that CPUs are not optimized for training deep learning models, and they are much less efficient in this regard. To sum up, I highly discourage training with CPUs, as the time taken could be prohibitively long. To save time, I strongly recommend using GPUs or other dedicated devices to expedite your training.

Bobholamovic · 2025-01-10T15:20:51Z

Sorry, I just misunderstood your question. This looks like a bug. Please provide the versions of the PaddlePaddle and PaddleDetection you are using, so that we can locate the problem.

nicklasb · 2025-01-10T15:28:26Z

Hi!
No problem, of course. It is friday. :-)
I am running from the master branch of PaddleDetection and PaddlePaddle 2.6,2 from pip.

WRT not using a GPU, I have a comparatively small dataset and train for a specific purpose, so I just run the training overnight. But don't worry, I will get a really badass setup later. :-)

Bobholamovic · 2025-01-10T15:47:14Z

I found this in PaddlePaddle 2.6 source code. Unfortunately, it looks like the latest version of PaddlePaddle (3.0.0b2) still does not support multi-processing data loaders on Windows. As for the low utilization issue, one reason could be that data loading takes too much time and the code lacks concurrency. I would suggest setting up some timers in the main training loop to measure the time taken for each step. This will help identify the bottleneck, allowing us to design targeted strategies to improve performance.

nicklasb · 2025-01-10T15:58:42Z

Hm. Unfortunate indeed.
Oddly I am not getting that warning as far as I can see, doesn't it properly identify the platform, perhaps?
Would I benefit from running it in something like a Linux docker container then? (i tried and had some issues, but obviously I will pursue that if it is possibility)

WRT efficiency, I'll see what I find.

nicklasb · 2025-01-11T01:26:08Z

Oddly, (on a debian docker) it still doesn't start several processes, and still takes about 5 percent.
I am beginning to suspect something else, like antivirus or something (however then I would think that would take a lot of CPU, and it doesn't). And I have played around with process priorities aswell for no avail.
However, I am not having this issue with Torch or any other ML framework. Really strange.

nicklasb · 2025-01-11T02:00:18Z

I set a large number of workers in the docker image, and there it seems like it is creating workers, I got this when I control -C that process:
DataLoader 36 workers exit unexpectedly, pids: 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689

But I am not sure dataloader workers mean training workers, I am not that versed in PaddleDetection.

nicklasb · 2025-01-11T12:25:10Z

Ok, I solved the issue on efficiency under Windows 11
The operating system had to be set in performance mode, and sometimes it set the VSCode process in "Efficiency mode" which constrained its and the processes it spawns ability to use the CPU. Setting those, the Python script will be able to fully take advantage of a logical core (not physical).

However, the issue with the docker instance is odd.
Running top, I see a lot of worker processes, but they are not doing anything, they are completely idle.

Bobholamovic · 2025-01-13T03:33:36Z

However, the issue with the docker instance is odd.
Running top, I see a lot of worker processes, but they are not doing anything, they are completely idle.

I guess this is because the underlying host OS is still Windows, and the PaddlePaddle framework does not offer a workable solution for multiprocessing on Windows. To verify this, I wrote a simple code snippet:

import numpy as np
from paddle.io import IterableDataset, DataLoader

class RandomDataset(IterableDataset):
    def __iter__(self):
        while True:
            image = np.random.random([784]).astype(np.float32)
            label = np.random.randint(0, 9, (1, )).astype(np.int64)
            yield image, label
    

for i, _ in enumerate(DataLoader(RandomDataset(), num_workers=4)):
    print(i)

You can run this snippet inside the Docker container to check if 4 worker processes are running with relatively high utilization, as would be expected (I have verified this on my Linux machine). If there are still idle worker processes, the issue is likely with PaddlePaddle, possibly due to the lack of support for multiprocessing on Windows. In that case, I would suggest opening an issue here.

paddle-bot bot assigned nemonameless Jan 10, 2025

TingquanGao assigned Bobholamovic Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No multiprocessing/low CPU utilization on CPU and/or Windows #9271

No multiprocessing/low CPU utilization on CPU and/or Windows #9271

nicklasb commented Jan 10, 2025

Bobholamovic commented Jan 10, 2025

Bobholamovic commented Jan 10, 2025

nicklasb commented Jan 10, 2025

Bobholamovic commented Jan 10, 2025

nicklasb commented Jan 10, 2025 •

edited

Loading

nicklasb commented Jan 11, 2025 •

edited

Loading

nicklasb commented Jan 11, 2025

nicklasb commented Jan 11, 2025

Bobholamovic commented Jan 13, 2025

No multiprocessing/low CPU utilization on CPU and/or Windows #9271

No multiprocessing/low CPU utilization on CPU and/or Windows #9271

Comments

nicklasb commented Jan 10, 2025

问题确认 Search before asking

请提出你的问题 Please ask your question

Bobholamovic commented Jan 10, 2025

Bobholamovic commented Jan 10, 2025

nicklasb commented Jan 10, 2025

Bobholamovic commented Jan 10, 2025

nicklasb commented Jan 10, 2025 • edited Loading

nicklasb commented Jan 11, 2025 • edited Loading

nicklasb commented Jan 11, 2025

nicklasb commented Jan 11, 2025

Bobholamovic commented Jan 13, 2025

nicklasb commented Jan 10, 2025 •

edited

Loading

nicklasb commented Jan 11, 2025 •

edited

Loading