-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Multiproc always looks for GPUs making it impossible to run pipelines on CPU-only machines #3717
Comments
How would you handle the node failure? I mean, if we skip the check in the plugin and force nodes to act as a GPU is available, I'm afraid the execution will fail Edit: maybe I misunderstood your point. The implementation checks if a node is a "gpu node" and handle a separate process queue for those. So if a pipeline does not include GPU nodes or handle different path for CUDA and non-CUDA systems, the workflow should run without problem |
Seems like a minimal reproducible example would be helpful here. |
Moreover, a further step is updating interfaces and nodes to support inputs.use_gpu = true/false if the tool has a GPU version |
Yes exactly! That is indeed my thinking -- if I know that none of the nodes in my pipeline are using GPUs, then I should just be able to turn off GPU checking.
I do not have a minimal example as of now, but all the code for the pipeline I working on is here: https://github.com/man-shu/diffusion-preprocessing. But here's the error I get on our CPU-only HPC: Traceback (most recent call last):
File "/storage/store3/work/haggarwa/diffusion/diffusion-preprocessing/runners/run_tracto_drago_downsampled.py", line 33, in <module>
tracto.run(plugin="MultiProc", plugin_args={"n_procs": 60})
File "/storage/store3/work/haggarwa/nipype/nipype/pipeline/engine/workflows.py", line 610, in run
runner = plugin_mod(plugin_args=plugin_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/storage/store3/work/haggarwa/nipype/nipype/pipeline/plugins/multiproc.py", line 136, in __init__
self.n_gpus_visible = gpu_count()
^^^^^^^^^^^
File "/storage/store3/work/haggarwa/nipype/nipype/pipeline/plugins/tools.py", line 187, in gpu_count
return len(GPUtil.getGPUs())
^^^^^^^^^^^^^^^^
File "/data/parietal/store3/work/haggarwa/miniconda3/envs/dwiprep/lib/python3.12/site-packages/GPUtil/GPUtil.py", line 102, in getGPUs
deviceIds = int(vals[i])
^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."
I am using FSL's BEDPOSTX and ProbTrackX2, and at least BEDPOSTX has such a parameter |
What's the rest of the traceback? Just running on my local (CPU-only) system:
I think the fix should probably be in |
Sorry the last line somehow kept disappearing on my tmux window. Updated the error in my previous comment. |
Got it. Well, it looks like https://github.com/anderskm/gputil We may want to consider vendoring just the bits we need. In particular, I think we need these patches:
Everything outside of |
I do have an old personal machine with Windows on it. Will try this on that over the weekend. |
Hello,
I noticed that in #3642 GPU support has been added, which is indeed much appreciated.
However, the current implementation always checks for GPUs availability and there's no way to turn this off. Users (like me) might need to do so, for example, when they are trying to run a pipeline on a CPU-only HPC (where no NVIDIA-SMI is installed).
Could we maybe have another parameter that would skip checking for GPUs? I would be up for making a PR if you think this makes sense. Please let me know.
Thanks!
The text was updated successfully, but these errors were encountered: