how to carry out multi-GPU with lightning-pose? #138

Wulin-Tan · 2024-04-07T09:27:36Z

Hi, lighting pose team
I am very interested in this super cool tool.
How to do multi-GPU with lighting pose?
I can't find multi-GPU information in the tutorial.

themattinthehatt · 2024-04-07T22:09:08Z

We do not currently support multi-GPU training but it should not be difficult to implement with our current setup (at least for the supervised models). We'll be happy to look into this but it might take us a week or so, we'll keep you updated.

YitingChang · 2024-07-21T16:54:03Z

I'm also interested in multi-GPU training and would like to follow this. Thank you!

ksikka · 2024-10-07T18:13:20Z

Hi @Wulin-Tan and @YitingChang, I am starting to investigate support for multi-GPU on lightning pose. Could you provide some more context about your goals with multi-gpu - i.e. are you requesting this to accelerate training or inference, or are you running into memory limitations with one GPU? This will help us guide our development. Thanks!

YitingChang · 2024-10-07T18:59:12Z

Hi @ksikka ,

Thank you for following up! I'm encountering memory limitations with a single GPU, so I'd like to explore running Lightning Pose with multiple GPUs.

To give you more context on my current setup: as I add more features—such as unsupervised losses and temporal context networks—the memory demands increase. I've had to significantly resize images and reduce batch sizes to fit within the memory constraints. Currently, I’m running Lightning Pose on a cluster with one GPU that has 40 GB of memory. Allocating more GPUs shouldn't be an issue if Lightning Pose can support multi-GPU functionality.

Wulin-Tan · 2024-10-14T16:51:32Z

@ksikka yes, I need to handle more GPU memory as well as faster training if possible.

ksikka · 2024-10-17T19:42:56Z

Hi @YitingChang and @Wulin-Tan support for multi-GPU was just added for supervised training. This works for both heatmap and heatmap_mchrnn as long as losses_to_use: [].

To use it, set num_gpus in the config to the number of GPUs on your machine. train_batch_size and val_batch_size must be divisible by num_gpus.

So the request is fulfilled for supervised. Support for multi-gpu when using unsupervised losses is a work-in-progress.

ksikka · 2024-10-17T19:48:02Z

Without multi-gpu on unsupervised, you might have success with gradient accumulation (see accumulate_grad_batches in the configuration file docs. Say you want an effective batch size of 8 but can only fit batch size 2 in memory. then you could set train_batch_size: 2 and accumulate_grad_batches: 4. This is an experimental feature so if you do try it, report back with your feedback. Thanks!

Update: also reduce sequence_length by the same factor as you reduce train_batch_size, since sequence_length is the batch size for unlabeled frames when using unsupervised losses. However when using the context model (heatmap_mchrnn), do not reduce below 5, as the number of predictions is n-4 for the context model. I'll add this to the docs in the next PR.

themattinthehatt self-assigned this Apr 19, 2024

themattinthehatt added the enhancement New feature or request label Apr 19, 2024

ksikka mentioned this issue Oct 15, 2024

Change backbone finetuning strategy to allow for DDP #205

Merged

ksikka linked a pull request Oct 16, 2024 that will close this issue

Support multi-GPU training for supervised training #206

Merged

themattinthehatt closed this as completed in #206 Oct 17, 2024

ksikka reopened this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to carry out multi-GPU with lightning-pose? #138

how to carry out multi-GPU with lightning-pose? #138

Wulin-Tan commented Apr 7, 2024

themattinthehatt commented Apr 7, 2024

YitingChang commented Jul 21, 2024

ksikka commented Oct 7, 2024

YitingChang commented Oct 7, 2024

Wulin-Tan commented Oct 14, 2024

ksikka commented Oct 17, 2024 •

edited

Loading

ksikka commented Oct 17, 2024 •

edited

Loading

how to carry out multi-GPU with lightning-pose? #138

how to carry out multi-GPU with lightning-pose? #138

Comments

Wulin-Tan commented Apr 7, 2024

themattinthehatt commented Apr 7, 2024

YitingChang commented Jul 21, 2024

ksikka commented Oct 7, 2024

YitingChang commented Oct 7, 2024

Wulin-Tan commented Oct 14, 2024

ksikka commented Oct 17, 2024 • edited Loading

ksikka commented Oct 17, 2024 • edited Loading

ksikka commented Oct 17, 2024 •

edited

Loading

ksikka commented Oct 17, 2024 •

edited

Loading