-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow specification for GPU device index #96
Conversation
Here is the test that I used:
If run on my laptop (CPU-only), I get the output
which confirms that the CPU case works, but obviously the GPU case isn't going to work. If I run on Wilkes3 with four GPUs and four MPI processes, I get the output
which confirms that the GPU case works, too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing!
At a quick glance this looks good - I'll do a detailed review later when I have some time.
One quick comment before then - can you provide some simple instructions on how I can check/verify this is working on CSD3/elsewhere?
We will probably want an example adding to the examples/
and some info adding to the docs once the code is settled before it goes in.
Sure. I created a new branch to demonstrate the testing: 85_gpu_device_number_test. Would you like me to include the Slurm scripts, too? |
afa93d8
to
188b305
Compare
Okay, this is ready for re-review! I added some docs and managed to get example 3 working on Wilkes3, giving the following output for 2 GPUs:
Will test it for 4 GPUs, too, but don't anticipate any issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great addition @jwallwork23
The new docs and example read really well.
Added a couple of points that I feel would make things clearer for me as an external reader, feel free to incorporate or not.
Once we've resolved these I think we're good to go!
Co-authored-by: jatkinson1000 <109271713+jatkinson1000@users.noreply.github.com>
Thanks @jatkinson1000, this is now ready for re-review.
I can confirm that this worked (with the updated
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jwallwork23 This is a great addition!
All looking good to me now so I'll squash and merge shortly.
* Have get_device use torch::Device * Add device_number arg for get_device * Throw error if device_number used in CPU-only case * Disallow negative device number * Actually use the device number * Use device number for torch_zeros * Use device number for torch_ones * Use device number for torch_empty * Use device number for torch_from_blob * Device and device number args for torch_module_load * Pass device and device number to torch_jit_load by value * Make device number argument to torch_module_load optional * Make device number argument to torch_tensor_from_array optional * Make device number argument to other subroutines optional * Make device argument to torch_module_load optional * Add function for determining device_index * Rename device number as index * Rename device as device type * Device index defaults to -1 on CPU and 0 on GPU * Make device type and index optional on C++ side * Fix typo in torch_model_load * Fix typos in example 1 * Initial draft of example 3_MultiGPU * Differentiate between errors and warnings in C++ code * Formatting * Add mpi4py to requirements for example 3 * Use mpi4py to differ inputs in simplenet_infer_python * Raise ValueError for Python inference with invalid device * Print rank in Python case; updates to README * Setup MPI for simplenet_infer_fortran, too * Write formatting for example 3 * Add note on building with Make * Print before and after; mpi_finalise; output on CPU; comments * Docs: device->device_type for consistency * Add docs on MultiGPU * Update warning text for defaulting to 0 Co-authored-by: jatkinson1000 <109271713+jatkinson1000@users.noreply.github.com> * Mention MPI in requirements * Update outputs for example 3 * Use NP rather than 4 GPUs * Implement SimpleNet in example 3 but with a twist * Add code snippets for multi-GPU doc section * Add note about multiple GPU support to README.md. --------- Co-authored-by: jatkinson1000 <109271713+jatkinson1000@users.noreply.github.com> Co-authored-by: Jack Atkinson <jwa34@cam.ac.uk>
Closes #85.
The main change associated with this PR is allowing the GPU device index to be specified for the following functions and subroutines:
torch_zeros
(C++) /torch_tensor_zeros
(Fortran)torch_ones
(C++) /torch_tensor_zeros
(Fortran)torch_empty
(C++)torch_from_blob
(C++) /torch_tensor_from_blob
(Fotran)torch_jit_load
(C++) /torch_module_load
(Fortran)torch_tensor_from_array_${PREC}$_${RANK}$d
(Fortran)To avoid confusion/ambiguity,
device
is replaced bydevice_type
in several places in the code, asdevice_type
anddevice_index
are consistent with the naming used in CUDA.The GPU device index is specified using an additional argument, although this is made optional both in C++ and Fortran to ensure that the examples can be run without modification. In the case of
torch_jit_load
/torch_module_load
, thedevice_type
also needed to be added as an optional argument to support the new functionality.If unset:
device_type
defaults totorch_kCPU
device_index
defaults to -1 ifdevice_type
istorch_kCPU
and 0 ifdevice_type
istorch_kGPU
.New functions called
torch_tensor_get_device_index
are introduced so that we can test the new functionality.