CUDA – memory spanning multiple devices and a question about cuda_global_memory #335

NeutralKaon · 2024-08-09T13:33:31Z

Hi there,

Thanks for making a fantastic bit of code. I've got a question about what to do if your problem won't fit into GPU ram on one device, but will on two. You have reference to using cudaMallocManaged, which I naïvely understand would handle host/GPU page faulting in case the problem is too large, and there is much pain (!) to be had in order to span across multiple devices and keep them synchronised.

a) You have explicitly disabled this and gone for manual memory management by hardcoding a parameter (cuda_global_memory) as false. Is this for performance reasons? I do find that CPU-only approaches are indeed much faster.

b) Do you have any plans to permit multi-GPU usage and trying to span them with some sort of NUMA architecture? My problem is about 110 GB in ram – don't ask! – and I realise this is a huge amount of work and the answer is probably 'no'.

Thanks for your help,

The text was updated successfully, but these errors were encountered:

mblum94 · 2024-08-12T10:41:02Z

Hi there,

a) In my experience, there are some rare situations where global memory is slower than pure GPU memory. For example, host to device copies in my experience do not overlap with compute tasks. Before we had some tools using global memory and some didn't. We unified this and turned it off by default. However, you can activate global memory by an environment variable, namely by setting BART_GPU_GLOBAL_MEMORY=1
This does not seem to be documented, but we'll add it soon.

b) We have support for multi-gpu based on MPI. So far, it is available in the pics tool via command line options and for training in the deep-learning tools (reconet and nlinvnet). If you tell us more, maybe your problem is already covered.

Best,
Moritz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA – memory spanning multiple devices and a question about cuda_global_memory #335

CUDA – memory spanning multiple devices and a question about cuda_global_memory #335

NeutralKaon commented Aug 9, 2024 •

edited

Loading

mblum94 commented Aug 12, 2024

CUDA – memory spanning multiple devices and a question about cuda_global_memory #335

CUDA – memory spanning multiple devices and a question about cuda_global_memory #335

Comments

NeutralKaon commented Aug 9, 2024 • edited Loading

mblum94 commented Aug 12, 2024

NeutralKaon commented Aug 9, 2024 •

edited

Loading