Skip to content

Conversation

@lilei199908
Copy link
Collaborator

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the project to version 0.2.1, which involves updating patches for sglang and megatron dependencies, upgrading PyTorch and related libraries, and documenting the new stable version.

Key Changes:

  • Updates PyTorch from 2.8.0 to 2.9.1 and related packages (cuda-python, transformer_engine, etc.)
  • Updates sglang to nightly-dev-20251208 build (5e2cda6158e670e64b926a9985d65826c537ac82)
  • Adds new patch files for sglang v0.5.6 and megatron with extensive changes including MTP support with packed sequences, security fixes, and API compatibility updates
  • Updates documentation to reflect new stable version

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
build_conda.sh Updates dependency versions (PyTorch 2.9.1, cuda-python 13.1.0, transformer_engine 2.10.0), changes sglang checkout to newer nightly, updates torch_memory_saver commit, adds cudnn fix, and updates patch paths to v0.5.6
docker/README.md Updates current stable version to nightly-dev-20251208 and moves previous v0.5.5.post1 to history
docker/patch/v0.5.6/sglang.patch Large patch file (2053 lines) adding routed experts capture, MTP improvements, RMSNorm changes, VLM optimizations, and various bug fixes
docker/patch/v0.5.6/megatron.patch Large patch file (869 lines) adding MTP packed sequence support, gated attention, checkpoint compatibility fixes, and API updates for PyTorch 2.9.1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -21,13 +21,13 @@ micromamba install -n slime cuda cuda-nvtx cuda-nvtx-dev nccl -c nvidia/label/cu
micromamba install -n slime -c conda-forge cudnn -y

# prevent installing cuda 13.0 for sglang
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 23 says "prevent installing cuda 13.0 for sglang" but the actual package being installed is cuda-python==13.1.0, which appears to be version 13.1.0, not preventing cuda 13.0. This comment is misleading and should either be updated to reflect the actual version being installed, or if the intent is to prevent cuda 13.0 installation, the package version should be corrected.

Suggested change
# prevent installing cuda 13.0 for sglang
# install cuda-python==13.1.0 to ensure compatibility with sglang and avoid cuda 13.0

Copilot uses AI. Check for mistakes.
return msc.torch.load(load_path, map_location='cpu')
else:
- return torch.load(load_path, map_location='cpu')
+ return torch.load(load_path, map_location='cpu', weights_only=False)
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting weights_only=False in torch.load is a security concern. This allows arbitrary Python code execution during deserialization and is explicitly discouraged by PyTorch. Unless there's a specific reason to load pickled Python objects, consider using weights_only=True or adding a comment explaining why this is necessary and safe in this context.

Suggested change
+ return torch.load(load_path, map_location='cpu', weights_only=False)
+ return torch.load(load_path, map_location='cpu', weights_only=True)

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +59
- raise KeyError(
- f"{sh_ten.key} from model not in state dict:"
- f" {sorted(metadata.state_dict_metadata.keys())}"
- )
+ # raise KeyError(
+ # f"{sh_ten.key} from model not in state dict:"
+ # f" {sorted(metadata.state_dict_metadata.keys())}"
+ # )
+ print(f"{sh_ten.key} from model not in state dict, will skip")
+ continue
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out error handling that is replaced with a print statement and continue. This silently skips missing keys in the state dict instead of raising an error. This could mask important issues during model loading. Consider using a warning instead of print, or providing a configuration option to control this behavior rather than always silently skipping.

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +273
+ torch.distributed.isend, tensor_send_prev, prev_pipeline_rank,
)
ops.append(send_prev_op)
if tensor_recv_prev is not None:
recv_prev_op = torch.distributed.P2POp(
- torch.distributed.irecv, tensor_recv_prev, prev_pipeline_rank, group
+ torch.distributed.irecv, tensor_recv_prev, prev_pipeline_rank,
)
ops.append(recv_prev_op)
if tensor_send_next is not None:
send_next_op = torch.distributed.P2POp(
- torch.distributed.isend, tensor_send_next, next_pipeline_rank, group
+ torch.distributed.isend, tensor_send_next, next_pipeline_rank,
)
ops.append(send_next_op)
if tensor_recv_next is not None:
recv_next_op = torch.distributed.P2POp(
- torch.distributed.irecv, tensor_recv_next, next_pipeline_rank, group
+ torch.distributed.irecv, tensor_recv_next, next_pipeline_rank,
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'group' parameter is being removed from all torch.distributed.P2POp calls. This change might break distributed pipeline parallelism if the group parameter is required by newer versions of PyTorch. Ensure that the default group behavior matches the intended communication group, or verify that this API change is compatible with the target PyTorch version (2.9.1).

Suggested change
+ torch.distributed.isend, tensor_send_prev, prev_pipeline_rank,
)
ops.append(send_prev_op)
if tensor_recv_prev is not None:
recv_prev_op = torch.distributed.P2POp(
- torch.distributed.irecv, tensor_recv_prev, prev_pipeline_rank, group
+ torch.distributed.irecv, tensor_recv_prev, prev_pipeline_rank,
)
ops.append(recv_prev_op)
if tensor_send_next is not None:
send_next_op = torch.distributed.P2POp(
- torch.distributed.isend, tensor_send_next, next_pipeline_rank, group
+ torch.distributed.isend, tensor_send_next, next_pipeline_rank,
)
ops.append(send_next_op)
if tensor_recv_next is not None:
recv_next_op = torch.distributed.P2POp(
- torch.distributed.irecv, tensor_recv_next, next_pipeline_rank, group
+ torch.distributed.irecv, tensor_recv_next, next_pipeline_rank,
+ torch.distributed.isend, tensor_send_prev, prev_pipeline_rank, group
)
ops.append(send_prev_op)
if tensor_recv_prev is not None:
recv_prev_op = torch.distributed.P2POp(
- torch.distributed.irecv, tensor_recv_prev, prev_pipeline_rank, group
+ torch.distributed.irecv, tensor_recv_prev, prev_pipeline_rank, group
)
ops.append(recv_prev_op)
if tensor_send_next is not None:
send_next_op = torch.distributed.P2POp(
- torch.distributed.isend, tensor_send_next, next_pipeline_rank, group
+ torch.distributed.isend, tensor_send_next, next_pipeline_rank, group
)
ops.append(send_next_op)
if tensor_recv_next is not None:
recv_next_op = torch.distributed.P2POp(
- torch.distributed.irecv, tensor_recv_next, next_pipeline_rank, group
+ torch.distributed.irecv, tensor_recv_next, next_pipeline_rank, group

Copilot uses AI. Check for mistakes.
+ decoder_input = decoder_input.detach()

- hidden_states = make_viewless_tensor(inp=hidden_states, requires_grad=True, keep_graph=True)
+ hidden_states = make_viewless_tensor(inp=hidden_states, requires_grad=True, keep_graph=False)
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change from keep_graph=True to keep_graph=False in make_viewless_tensor could break gradient computation if the computational graph needs to be retained for backward passes. This is a potentially breaking change that should be carefully reviewed to ensure it doesn't affect gradient backpropagation in multi-token prediction scenarios.

Suggested change
+ hidden_states = make_viewless_tensor(inp=hidden_states, requires_grad=True, keep_graph=False)
+ hidden_states = make_viewless_tensor(inp=hidden_states, requires_grad=True, keep_graph=True)

Copilot uses AI. Check for mistakes.
Comment on lines +722 to +724
- assert (
- packed_seq_params is None
- ), f"multi token prediction + sequence packing is not yet supported."
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion that prevented MTP from working with sequence packing has been removed. While this enables the feature, the comment removal doesn't indicate that the underlying implementation has been fully tested with packed sequences. Consider adding a warning or documentation about the testing status of this feature combination.

Suggested change
- assert (
- packed_seq_params is None
- ), f"multi token prediction + sequence packing is not yet supported."
+ if packed_seq_params is not None:
+ import warnings
+ warnings.warn(
+ "multi token prediction + sequence packing is enabled, but this combination has not been fully tested. Use with caution.",
+ UserWarning
+ )

Copilot uses AI. Check for mistakes.
Comment on lines +848 to +852
+ group.add_argument('--post-self-attn-layernorm', action='store_true',
+ help='If set, use post self attention layernorm.')
+ group.add_argument('--post-mlp-layernorm', action='store_true',
+ help='If set, use post MLP layernorm.')
+ group.add_argument('--use-gated-attention', action='store_true',
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting trust_remote_code=True by default when constructing the Hugging Face tokenizer means AutoTokenizer.from_pretrained will execute arbitrary Python code from any remote model repository passed in pretrained_model_name_or_path. If an attacker can point this code at a malicious or compromised Hugging Face repo (or if an upstream repo is compromised), they can achieve remote code execution in the training/inference environment. To mitigate this, keep trust_remote_code disabled by default (or restrict it to a small allowlist of trusted repos) and require explicit, security-reviewed opt-in before enabling remote code execution for tokenizers.

Copilot uses AI. Check for mistakes.

pip install git+https://github.com/fzyzcjy/torch_memory_saver.git@9b8b788fdeb9c2ee528183214cef65a99b71e7d5 --no-cache-dir --force-reinstall
pip install git+https://github.com/fzyzcjy/torch_memory_saver.git@dc6876905830430b5054325fa4211ff302169c6b --no-cache-dir --force-reinstall
pip install git+https://github.com/fzyzcjy/Megatron-Bridge.git@dev_rl --no-build-isolation
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build script installs Megatron-Bridge directly from a mutable Git branch (git+https://github.com/fzyzcjy/Megatron-Bridge.git@dev_rl) with --no-build-isolation, so every build executes whatever code the remote branch currently contains. If that GitHub repository or branch is ever compromised, an attacker can inject arbitrary code into the build environment with access to credentials and artifacts. To reduce this supply-chain risk, pin this dependency to an immutable identifier (e.g., a specific commit SHA or a signed release artifact) and avoid installing from mutable branches in automated builds.

Copilot uses AI. Check for mistakes.
@lilei199908 lilei199908 changed the title 0.2.1 [release] bump to v0.2.1 Dec 12, 2025
@lilei199908 lilei199908 merged commit 0934a0e into THUDM:main Dec 12, 2025
8 of 9 checks passed
@fzyzcjy
Copy link
Collaborator

fzyzcjy commented Dec 15, 2025

@lilei199908 Hi, a tiny nit: it would be great to use squash instead of merge when clicking the merge button in github to be consistent w/ other commits :)

image

@lilei199908
Copy link
Collaborator Author

@lilei199908 Hi, a tiny nit: it would be great to use squash instead of merge when clicking the merge button in github to be consistent w/ other commits :)

image

Sorry about that. Good to know! Thanks.

@fzyzcjy
Copy link
Collaborator

fzyzcjy commented Dec 15, 2025

It's OK and no worries!

Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants