Skip to content

Conversation

@JackXu0
Copy link

@JackXu0 JackXu0 commented Jan 20, 2026

There is a TODO in rollout.py to add loss mask compression. I think it makes sense as high compression rate is expected.

Detailed actions:

  • Add compress_loss_mask and decompress_loss_mask utilities
  • Compress masks in RolloutManager before sending
  • Decompress in FSDP and Megatron data processing
  • Add comprehensive tests for RLE functions

Reduce memory usage and network overhead by compressing binary loss masks
using run-length encoding (RLE) when transferring data between rollout
and training components.

- Add compress_loss_mask and decompress_loss_mask utilities
- Compress masks in RolloutManager before sending
- Decompress in FSDP and Megatron data processing
- Add comprehensive tests for RLE functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant