empty Cache after logps_per_token #2686

shirinyamani · 2025-01-29T23:02:02Z

Shouldn't we torch.cuda.empty_cache() after computing the logprobs in [here] (

Line 430 in 801582e

def get_per_token_logps(model, input_ids, num_logits_to_keep):

) also after loss computation?

The text was updated successfully, but these errors were encountered:

qgallouedec · 2025-01-29T23:15:19Z

I usually refer to https://discuss.pytorch.org/t/about-torch-cuda-empty-cache/34232/2 when I consider emptying the cache. In my understanding, when you leave the function, the internal variables no longer exist so the tensor stop being reference and the underlying memory is freed.

But I might be wrong.

shirinyamani · 2025-01-29T23:20:14Z

right, but we wipe the cache in same scenario in ppo ,

I am not sure either!

qgallouedec · 2025-01-29T23:54:16Z

It's probably worth doing some tests/profiling with a toy example to see the diff

github-actions bot added 🏋 GRPO Related to GRPO 🐛 bug Something isn't working labels Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

empty Cache after logps_per_token #2686

empty Cache after logps_per_token #2686

shirinyamani commented Jan 29, 2025 •

edited

Loading

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 29, 2025

qgallouedec commented Jan 29, 2025 •

edited

Loading

empty Cache after logps_per_token #2686

empty Cache after logps_per_token #2686

Comments

shirinyamani commented Jan 29, 2025 • edited Loading

qgallouedec commented Jan 29, 2025

shirinyamani commented Jan 29, 2025

qgallouedec commented Jan 29, 2025 • edited Loading

shirinyamani commented Jan 29, 2025 •

edited

Loading

qgallouedec commented Jan 29, 2025 •

edited

Loading