[QST] An odd, sudden OOM using NVT Dataloaders with TorchRec #842

shoyasaxa · 2023-02-27T19:11:53Z

❓ Questions & Help

Hello, I was just playing around with using NVT dataloaders with TorchRec, and it was working fine for the most part. However, when it came to trying out batch inference on a large dataset, I ran into a peculiar bug where the script would run perfectly fine for about an hour with stable GPU memory usage (at around 94% for the first GPU), then suddenly at random point the GPU memory (for the first GPU out of four V100's I used) would start to creep up towards 100% and quickly OOM. Weirdly, I am no longer able to reproduce this issue, but nevertheless I was wondering if anyone had any ideas on why that could be the case.

One possible idea that @rnyak suggested was that perhaps the data partitions are not evenly split, and one of the files happen to have bigger partitions than other files. So when it comes to loading that one file, the GPU memory usage shoots up.

Also I am using NVTabular to preprocess the data. One feature request I have is for NVTabular to spit out the most optimal number of files when preprocessing (currently if I use 4 GPUs to preprocess a humongous dataset without setting a out_files_per_proc parameter, it spits out 4 humongous files).

The text was updated successfully, but these errors were encountered:

rnyak · 2023-02-28T23:52:02Z

@shoyasaxa thanks for creating the ticket.

just to clarify: I thought you are doing batch inference on multiple GPUs? not on single GPU? can you please confirm/clarify that?

My suggestions was particularly for multi-gpu training case.. meaning for example if you train your model with multiple-gpu we expect the number of partitions per parquet file is divisible by number of GPUs. That means, if you are using 4 GPUs at the same time for model training (or inference) via torch.nn.parallel(), or torch.distributed, your parquet files should have 4, or 8, or 12, or 16, .. partitions that can be evenly distributed over GPUs.

shoyasaxa · 2023-03-09T16:18:57Z

Yes - this is doing batch inference on multiple GPUs (one instance with 4 V100 GPUs).

And yes - I also do the preprocessing using 4 GPUs, so the number of files outputted is a multiple of 4 as well.

rnyak · 2023-03-09T17:45:35Z

(currently if I use 4 GPUs to preprocess a humongous dataset without setting a out_files_per_proc parameter, it spits out 4 humongous files).

we have a WIP PR hopefully will answer your request..

shoyasaxa added the question Further information is requested label Feb 27, 2023

rnyak assigned jperez999 Feb 28, 2023

rnyak added the P1 Priority 1 label Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] An odd, sudden OOM using NVT Dataloaders with TorchRec #842

[QST] An odd, sudden OOM using NVT Dataloaders with TorchRec #842

shoyasaxa commented Feb 27, 2023

rnyak commented Feb 28, 2023 •

edited

Loading

shoyasaxa commented Mar 9, 2023

rnyak commented Mar 9, 2023

[QST] An odd, sudden OOM using NVT Dataloaders with TorchRec #842

[QST] An odd, sudden OOM using NVT Dataloaders with TorchRec #842

Comments

shoyasaxa commented Feb 27, 2023

❓ Questions & Help

rnyak commented Feb 28, 2023 • edited Loading

shoyasaxa commented Mar 9, 2023

rnyak commented Mar 9, 2023

rnyak commented Feb 28, 2023 •

edited

Loading