Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

empty Cache after logps_per_token #2686

Open
shirinyamani opened this issue Jan 29, 2025 · 3 comments
Open

empty Cache after logps_per_token #2686

shirinyamani opened this issue Jan 29, 2025 · 3 comments
Labels
🐛 bug Something isn't working 🏋 GRPO Related to GRPO

Comments

@shirinyamani
Copy link

shirinyamani commented Jan 29, 2025

Shouldn't we torch.cuda.empty_cache() after computing the logprobs in [here] (

def get_per_token_logps(model, input_ids, num_logits_to_keep):
) also after loss computation?

@github-actions github-actions bot added 🏋 GRPO Related to GRPO 🐛 bug Something isn't working labels Jan 29, 2025
@qgallouedec
Copy link
Member

I usually refer to https://discuss.pytorch.org/t/about-torch-cuda-empty-cache/34232/2 when I consider emptying the cache. In my understanding, when you leave the function, the internal variables no longer exist so the tensor stop being reference and the underlying memory is freed.

But I might be wrong.

@shirinyamani
Copy link
Author

right, but we wipe the cache in same scenario in ppo ,

I am not sure either!

@qgallouedec
Copy link
Member

qgallouedec commented Jan 29, 2025

It's probably worth doing some tests/profiling with a toy example to see the diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 GRPO Related to GRPO
Projects
None yet
Development

No branches or pull requests

2 participants