-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No multiprocessing/low CPU utilization on CPU and/or Windows #9271
Comments
I would say that such inefficiency is anticipated when using CPUs for deep learning training. Here are the downsides of using CPUs to train deep learning models. The training process comprises several distinct steps, such as data loading, data preprocessing, model forward pass, loss computation, model backward pass, and so on. To accelerate data loading and data preprocessing, you can increase |
Sorry, I just misunderstood your question. This looks like a bug. Please provide the versions of the PaddlePaddle and PaddleDetection you are using, so that we can locate the problem. |
Hi! WRT not using a GPU, I have a comparatively small dataset and train for a specific purpose, so I just run the training overnight. But don't worry, I will get a really badass setup later. :-) |
I found this in PaddlePaddle 2.6 source code. Unfortunately, it looks like the latest version of PaddlePaddle (3.0.0b2) still does not support multi-processing data loaders on Windows. As for the low utilization issue, one reason could be that data loading takes too much time and the code lacks concurrency. I would suggest setting up some timers in the main training loop to measure the time taken for each step. This will help identify the bottleneck, allowing us to design targeted strategies to improve performance. |
Hm. Unfortunate indeed. WRT efficiency, I'll see what I find. |
Oddly, (on a debian docker) it still doesn't start several processes, and still takes about 5 percent. |
I set a large number of workers in the docker image, and there it seems like it is creating workers, I got this when I control -C that process: But I am not sure dataloader workers mean training workers, I am not that versed in PaddleDetection. |
Ok, I solved the issue on efficiency under Windows 11 However, the issue with the docker instance is odd. |
I guess this is because the underlying host OS is still Windows, and the PaddlePaddle framework does not offer a workable solution for multiprocessing on Windows. To verify this, I wrote a simple code snippet: import numpy as np
from paddle.io import IterableDataset, DataLoader
class RandomDataset(IterableDataset):
def __iter__(self):
while True:
image = np.random.random([784]).astype(np.float32)
label = np.random.randint(0, 9, (1, )).astype(np.int64)
yield image, label
for i, _ in enumerate(DataLoader(RandomDataset(), num_workers=4)):
print(i) You can run this snippet inside the Docker container to check if 4 worker processes are running with relatively high utilization, as would be expected (I have verified this on my Linux machine). If there are still idle worker processes, the issue is likely with PaddlePaddle, possibly due to the lack of support for multiprocessing on Windows. In that case, I would suggest opening an issue here. |
问题确认 Search before asking
请提出你的问题 Please ask your question
Hi,
I am training a small PicoDet model, and I am doing it using the CPU on a Windows box.
The odd thing is that whatever I do, I never get more than 5-6 percent CPU load from the training Python process, and also, it is only one process, even though I have set worker_num to all kinds of values. I have also set use_multiprocess: True and so on.
I tried to run it in a debian docker, and while I have some issues there, that didn't either seem to start multiple processes.
Is there something I am missing? What settings should be enabled?
The text was updated successfully, but these errors were encountered: