add ckpt load save ci by lilei199908 · Pull Request #1104 · THUDM/slime

lilei199908 · 2025-12-12T13:40:29Z

No description provided.

Copilot

Pull request overview

This PR adds a new CI test to verify checkpoint save and load functionality for the Qwen3-4B model. The test exercises the ability to save checkpoints during training and subsequently load them for resumption.

Key changes:

New test file test_qwen3_4B_ckpt.py that tests checkpoint save/load by running training twice with different modes
Updated GitHub Actions workflow configuration to include the new test in the short e2e test suite

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
tests/test_qwen3_4B_ckpt.py	New test file that prepares Qwen3-4B model and runs training with checkpoint save and load modes to verify checkpoint functionality
.github/workflows/pr-test.yml.j2	Template file updated to include the new checkpoint test in the e2e-test-short job configuration
.github/workflows/pr-test.yml	Generated workflow file updated with the new test added to the test matrix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T13:43:55Z

tests/test_qwen3_4B_ckpt.py

+    U.execute_train(
+        train_args=train_args,
+        num_gpus_per_node=NUM_GPUS,
+        megatron_model_type=MODEL_TYPE,
+    )


The execute function contains two identical calls to U.execute_train with the same train_args. This appears to be a copy-paste error. The second call should likely be removed since the checkpoint save/load testing is already handled by calling execute twice with different modes ("save" and "load") from the main block.

Suggested change

U.execute_train(

train_args=train_args,

num_gpus_per_node=NUM_GPUS,

megatron_model_type=MODEL_TYPE,

)

add ckpt load save ci

21738e7

Copilot AI review requested due to automatic review settings December 12, 2025 13:40

Copilot started reviewing on behalf of lilei199908 December 12, 2025 13:40 View session

add ckpt load save ci

a2d514c

lilei199908 added the run-ci-ckpt label Dec 12, 2025

Copilot AI reviewed Dec 12, 2025

View reviewed changes

lilei199908 requested a review from zhuzilin December 12, 2025 17:39

zhuzilin merged commit c525704 into THUDM:main Dec 13, 2025
10 checks passed

Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025

add ckpt load save ci (THUDM#1104)

229225e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add ckpt load save ci#1104

add ckpt load save ci#1104
zhuzilin merged 2 commits intoTHUDM:mainfrom
lilei199908:add_ckpt_ci

lilei199908 commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lilei199908 commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants