Skip to content

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Oct 6, 2025

✨ Description

A small follow-up to #370, turning the language model itself into a module just like any other. Also move the hidden size to the language model config to enforce that module input/output dimensions are set by their parent module.

Also added small safety tweaks.

@jlamypoirier jlamypoirier marked this pull request as ready for review October 6, 2025 21:09
Copy link
Collaborator

@tscholak tscholak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# TODO: Prevent unsafe by default
yield from torch.load(path)
# TODO: Confirm that loading works with `weights_only=True`
yield from torch.load(path, weights_only=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making our sec folks happy

ModelTestingGroup.distributed: ModelTestingGroupAction.unimportant,
},
)
del MODEL_CONFIGS["starcoder_2"].config_dict["model"]["base_model"]["embeddings"]["num_position_embeddings"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unused but leaving it in the dict caused errors when comparing with converted configs.

# Set a dummy default user so we don't run in root by default.
# The image is still compatible with any user id.
RUN useradd user
USER user
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@jlamypoirier jlamypoirier merged commit 2446a17 into main Oct 6, 2025
4 checks passed
@jlamypoirier jlamypoirier deleted the jlp/language_model_block branch October 6, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants