Skip to content

Conversation

@lilei199908
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings December 12, 2025 05:31
@lilei199908 lilei199908 changed the title fix acotr init bugs fix actor init bugs Dec 12, 2025
@zhuzilin zhuzilin merged commit ba83b3a into THUDM:main Dec 12, 2025
7 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves the torch_memory_saver initialization logic from the generic TrainRayActor initialization to the Megatron-specific initialization function, addressing an initialization timing issue.

Key changes:

  • Removed torch_memory_saver import and configuration from slime/ray/train_actor.py
  • Added torch_memory_saver import and configuration to slime/backends/megatron_utils/initialize.py

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
slime/ray/train_actor.py Removed torch_memory_saver import and memory margin configuration logic from TrainRayActor.init()
slime/backends/megatron_utils/initialize.py Added torch_memory_saver import and memory margin configuration to Megatron initialization function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +102 to +105
if args.offload_train:
if (x := args.train_memory_margin_bytes) > 0:
logger.info(f"Set torch_memory_saver.memory_margin_bytes to {x}")
torch_memory_saver.memory_margin_bytes = x
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic has been inverted compared to the original implementation. In the original code in train_actor.py, if train_memory_margin_bytes was greater than 0, it would assert that offload_train must be true, raising an error if it wasn't. The new implementation silently ignores train_memory_margin_bytes when offload_train is false. Consider whether this silent failure is intentional or if validation should still occur when train_memory_margin_bytes is set without offload_train enabled.

Copilot uses AI. Check for mistakes.
lilei199908 added a commit to lilei199908/slime that referenced this pull request Dec 12, 2025
Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants