Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating FLOPs and training memory per GPU #16

Open
arghavan-kpm opened this issue Jul 25, 2024 · 3 comments
Open

Calculating FLOPs and training memory per GPU #16

arghavan-kpm opened this issue Jul 25, 2024 · 3 comments

Comments

@arghavan-kpm
Copy link

Hi,

Would you please give some instructions about the way you calculated FLOPs for a single forward pass during training and training memory per GPU? Thanks.

@devzhk
Copy link
Contributor

devzhk commented Jul 25, 2024

Hi,

We used fvcore from facebook research to automatically estimate FLOPs. For the GPU memory usage, we reported the maximum GPU memory occupied by tensors with torch.cuda.max_memory_allocated.

@arghavan-kpm
Copy link
Author

Thanks so much for your response. I tried fvcore but I keep getting this error message: "RuntimeError: Detected that you are using FX to torch.jit.trace a dynamo-optimized function. This is not supported at the moment." Could you please share the details of where do you call FlopCountAnalysis() and what do you pass as the input args?

@devzhk
Copy link
Contributor

devzhk commented Jul 25, 2024

Just disable torch.compile().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants