You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Longer term, I think it would be good if we could naively use multi gpu in an embarrassingly parallel way. That is, cufinufft keeps the responsibility of managing device choice and user gpu data (on correct device) outside of cufinufft. What makes it a bit complicated is how that is managed. One approach would require notion of driver context that is passed through (the device and memory is bound to that context and owned/managed by the caller). This obviously would be optional.
There are more involved ways to do multi gpu, but I think this would be a good balance, and the least disruptive.
For a production size ASPIRE workload, this would be a great performance opportunity. Currently the testing sizes are limited by the memory bounds. One can imagine just running the largest batch that will fit on a single GPU concurrently on two or four separate gpus (common configurations for HPC nodes).
This is not anything immediate, just something I think is worth considering.
The text was updated successfully, but these errors were encountered:
That is, cufinufft keeps the responsibility of managing device choice and user gpu data (on correct device) outside of cufinufft.
I don't quite understand. Are you saying that cufinufft is not responsible for coordinating data between devices, but that it's up to the user? In that case, cufinufft just needs to play nice and not get confused when different devices or contexts are used?
Longer term, I think it would be good if we could naively use multi gpu in an embarrassingly parallel way. That is, cufinufft keeps the responsibility of managing device choice and user gpu data (on correct device) outside of cufinufft. What makes it a bit complicated is how that is managed. One approach would require notion of driver context that is passed through (the device and memory is bound to that context and owned/managed by the caller). This obviously would be optional.
There are more involved ways to do multi gpu, but I think this would be a good balance, and the least disruptive.
For a production size ASPIRE workload, this would be a great performance opportunity. Currently the testing sizes are limited by the memory bounds. One can imagine just running the largest batch that will fit on a single GPU concurrently on two or four separate gpus (common configurations for HPC nodes).
This is not anything immediate, just something I think is worth considering.
The text was updated successfully, but these errors were encountered: