Skip to content

fix_load_ckpt#1095

Merged
lilei199908 merged 1 commit intoTHUDM:mainfrom
lilei199908:fix_load_ckpt
Dec 12, 2025
Merged

fix_load_ckpt#1095
lilei199908 merged 1 commit intoTHUDM:mainfrom
lilei199908:fix_load_ckpt

Conversation

@lilei199908
Copy link
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings December 12, 2025 03:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a checkpoint loading issue by adding a defensive check in the Megatron optimizer patch to prevent KeyError when accessing the "step" key in parameter groups. The version is also bumped to reflect the nightly build.

  • Added defensive check for "step" key existence before accessing it in parameter groups
  • Updated nightly development version

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
docker/version.txt Bumps nightly development version from 20251209d to 20251212a
docker/patch/latest/megatron.patch Adds safety check to verify "step" key exists in param_group before accessing it, preventing potential KeyError during checkpoint loading

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lilei199908 lilei199908 merged commit 3645cfa into THUDM:main Dec 12, 2025
8 checks passed
Birch-san added a commit to NovelAI/Megatron-LM that referenced this pull request Dec 12, 2025
…HUDM/slime#1095 "fix_load_ckpt".

This is an ad hoc patch for megatron. The origin issue is that megatron will assign step = None for the gpu optimizer in the hybrid optimizer which slime used as the pure cpu optimizer. In slime we won't use the gpu optimizer in most of the time, so this patch only fix the ckpt loading part.
Fengzdadi pushed a commit to Fengzdadi/slime that referenced this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant