Process hangs on 'Setting up PyTorch plugin "bias_act_plugin"...' when using multiple GPUs #41

markemus · 2021-02-18T18:20:18Z

I added these lines to train.py as lines 13 and 14 (right under import os):

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,2,3,4"

I tested the process with --gpus 1 and it spent a few minutes on Setting up PyTorch plugin "bias_act_plugin"... but then proceeded to train. However with --gpus 4 it has been hanging on this line for an hour and a half.

Creating output directory...
Launching processes...
Loading training set...

Num images:  505487
Image shape: [3, 256, 256]
Label shape: [0]

Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"...

Here's the nvidia-smi printout as well. As you can see three of the cores (2,3,4) have 100% GPU utilization while the first core (0) has 0%. The memory usage does not seem to be changing.


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |
| N/A   33C    P0    57W / 300W |   2088MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:1B:00.0 Off |                    0 |
| N/A   34C    P0    59W / 300W |  31147MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:3D:00.0 Off |                    0 |
| N/A   35C    P0    68W / 300W |   4261MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:3E:00.0 Off |                    0 |
| N/A   31C    P0    68W / 300W |   4345MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  Tesla V100-SXM2...  Off  | 00000000:88:00.0 Off |                    0 |
| N/A   33C    P0    71W / 300W |   4201MiB / 32510MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Do I just need to be more patient? On one core it really only took a couple of minutes to begin training.

EDIT: note that the cores (0,2,3,4) are not consecutive.

The text was updated successfully, but these errors were encountered:

nurpax · 2021-02-18T18:36:17Z

No, it definitely shouldn't take long. About the same as on single core.

I'd try a couple of things if the problem persists:

Set CUDA_VISIBLE_DEVICES in the shell before starting the Python process, just so that there isn't anything funky going on with multiprocessing.
This could be a case of a stale multiprocess lock in ~/.cache/torch_extensions (default on Linux). Try rm -rf'ing the torch_extensions directory and rerun.

If you're running docker, you should NOT need CUDA_VISIBLE_DEVICES separately. I think it's enough to configure available devices using the --gpus parameter. Also I think within Docker, CUDA_VISIBLE_DEVICES might in fact need to be in consecutive order (or probably does not need to be specified within the container), and that you should specify the real device mapping when you start docker. I'm a bit on thin ice here with this as I haven't ever run using such a configuration.

markemus · 2021-02-18T18:51:25Z

@nurpax Thank you! rm -rf ~/.cache/torch_extensions solved the issue and it's training now on 4 GPUs.

For posterity: I left the CUDA_VISIBLE_DEVICES definition in the code. This is not running in docker, it's running in Anaconda.

tasinislam21 · 2021-02-21T21:17:59Z

No, it definitely shouldn't take long. About the same as on single core.

I'd try a couple of things if the problem persists:

Set CUDA_VISIBLE_DEVICES in the shell before starting the Python process, just so that there isn't anything funky going on with multiprocessing.

This could be a case of a stale multiprocess lock in ~/.cache/torch_extensions (default on Linux). Try rm -rf'ing the torch_extensions directory and rerun.

If you're running docker, you should NOT need CUDA_VISIBLE_DEVICES separately. I think it's enough to configure available devices using the --gpus parameter. Also I think within Docker, CUDA_VISIBLE_DEVICES might in fact need to be in consecutive order (or probably does not need to be specified within the container), and that you should specify the real device mapping when you start docker. I'm a bit on thin ice here with this as I haven't ever run using such a configuration.

How do you do it on windows? It used work perfectly but when I run the project a few days layer it gets stuck.

nurpax · 2021-02-21T22:01:50Z

I'm not sure what's the exact location and don't have Windows access right now. But here's how you should be able to figure it out:

Change torch_utils/custom_ops.py as follows:

diff --git a/torch_utils/custom_ops.py b/torch_utils/custom_ops.py
index 4cc4e43..4dfcef7 100755
--- a/torch_utils/custom_ops.py
+++ b/torch_utils/custom_ops.py
@@ -20,7 +20,7 @@ from torch.utils.file_baton import FileBaton
 #----------------------------------------------------------------------------
 # Global options.
 
-verbosity = 'brief' # Verbosity level: 'none', 'brief', 'full'
+verbosity = 'full' # Verbosity level: 'none', 'brief', 'full'
 
 #----------------------------------------------------------------------------
 # Internal helper funcs.

Then run for example generate.py with default options, and check the logs. On my computer, it prints something like this:

Using /scratch/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /scratch/.cache/torch_extensions/bias_act_plugin/build.ninja...

This should reveal the Windows location for you.

tasinislam21 · 2021-02-22T12:22:58Z

I'm not sure what's the exact location and don't have Windows access right now. But here's how you should be able to figure it out:

Change torch_utils/custom_ops.py as follows:
diff --git a/torch_utils/custom_ops.py b/torch_utils/custom_ops.py
index 4cc4e43..4dfcef7 100755
--- a/torch_utils/custom_ops.py
+++ b/torch_utils/custom_ops.py
@@ -20,7 +20,7 @@ from torch.utils.file_baton import FileBaton
 #----------------------------------------------------------------------------
 # Global options.
 
-verbosity = 'brief' # Verbosity level: 'none', 'brief', 'full'
+verbosity = 'full' # Verbosity level: 'none', 'brief', 'full'
 
 #----------------------------------------------------------------------------
 # Internal helper funcs.
Then run for example generate.py with default options, and check the logs. On my computer, it prints something like this:
Using /scratch/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /scratch/.cache/torch_extensions/bias_act_plugin/build.ninja...
This should reveal the Windows location for you.

Thank you! The cache for windows can be found in 'C:\Users\<user_name>\AppData\Local\torch_extensions\torch_extensions\Cache'. I was able to delete it but also had to reinstall ninja to build bias_act_plugin again. In the end, it worked.

tedschw · 2023-11-26T15:06:35Z

removing the stale lock file ~/.cache/torch_extensions/py310_cu121/bias_act_plugin/3cb576a0039689487cfba59279dd6d46-nvidia-geforce-rtx-2060/lock worked for me

markemus closed this as completed Feb 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process hangs on 'Setting up PyTorch plugin "bias_act_plugin"...' when using multiple GPUs #41

Process hangs on 'Setting up PyTorch plugin "bias_act_plugin"...' when using multiple GPUs #41

markemus commented Feb 18, 2021 •

edited

Loading

nurpax commented Feb 18, 2021 •

edited

Loading

markemus commented Feb 18, 2021

tasinislam21 commented Feb 21, 2021

nurpax commented Feb 21, 2021

tasinislam21 commented Feb 22, 2021 •

edited

Loading

tedschw commented Nov 26, 2023

Process hangs on 'Setting up PyTorch plugin "bias_act_plugin"...' when using multiple GPUs #41

Process hangs on 'Setting up PyTorch plugin "bias_act_plugin"...' when using multiple GPUs #41

Comments

markemus commented Feb 18, 2021 • edited Loading

nurpax commented Feb 18, 2021 • edited Loading

markemus commented Feb 18, 2021

tasinislam21 commented Feb 21, 2021

nurpax commented Feb 21, 2021

tasinislam21 commented Feb 22, 2021 • edited Loading

tedschw commented Nov 26, 2023

markemus commented Feb 18, 2021 •

edited

Loading

nurpax commented Feb 18, 2021 •

edited

Loading

tasinislam21 commented Feb 22, 2021 •

edited

Loading