Calculating FLOPs and training memory per GPU #16

arghavan-kpm · 2024-07-25T00:42:21Z

Hi,

Would you please give some instructions about the way you calculated FLOPs for a single forward pass during training and training memory per GPU? Thanks.

devzhk · 2024-07-25T16:59:52Z

Hi,

We used fvcore from facebook research to automatically estimate FLOPs. For the GPU memory usage, we reported the maximum GPU memory occupied by tensors with torch.cuda.max_memory_allocated.

arghavan-kpm · 2024-07-25T18:06:55Z

Thanks so much for your response. I tried fvcore but I keep getting this error message: "RuntimeError: Detected that you are using FX to torch.jit.trace a dynamo-optimized function. This is not supported at the moment." Could you please share the details of where do you call FlopCountAnalysis() and what do you pass as the input args?

devzhk · 2024-07-25T19:02:43Z

Just disable torch.compile().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculating FLOPs and training memory per GPU #16

Calculating FLOPs and training memory per GPU #16

arghavan-kpm commented Jul 25, 2024

devzhk commented Jul 25, 2024

arghavan-kpm commented Jul 25, 2024

devzhk commented Jul 25, 2024

Calculating FLOPs and training memory per GPU #16

Calculating FLOPs and training memory per GPU #16

Comments

arghavan-kpm commented Jul 25, 2024

devzhk commented Jul 25, 2024

arghavan-kpm commented Jul 25, 2024

devzhk commented Jul 25, 2024