[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` #23036

kylesayrs · 2025-08-16T21:24:01Z

Purpose

Modernize UnquantizedLinearMethod by allowing the parameter to support weight_loader_v2
This is useful for quantization methods which wrap unquantized parameters, such as transforms. See [Transform] [Quantization] Add transforms to compressed tensors #22486
- These changes are not strictly required for CT transforms support, but will be required for supporting transforms + TP

Changes

Change the weight Parameter to an instance of ModelWeightParameter
Add UnquantizedLinearMethod to list of weight_loader_v2 supported methods

Caveats

Unfortunately, torch.compile does not support capturing Parameter subclasses. I spend a couple hours trying to patch in support, and while it seems like there is theoretically support Tensor subclasses, Parameter subclasses are very difficult to support and patches lead to cascading errors.
Typically, linear methods solve this by replacing the BasevLLMParameter with a torch.nn.Parameter after weight loading. However, this prevents the weight from being reloaded (see test_reload_weights_before_load_model).
The simplest solution is to get a pointer to the weight data for execution and leave the original parameter untouched. Note that accessing data during execution leads to another torch.compile error.

# required by torch.compile
# do not overwrite with Parameter class to preserve weight reloading
self.layer_weight_data = layer.weight.data

In most parameters, weight_loader_v2 is only an option for weight parameters, where as bias parameters always use weight_loader_v1. Unfortunately, for the QKVCrossParallelLinear class, both bias and weight parameters share the same weight loader function. Supporting weight_loader_v2 for bias parameters is out of scope of this PR, so the simplest solution is to implement a special bias_weight_loader for QKVCrossParallelLinear which is guaranteed to use weight_loader_v1.

def bias_weight_loader(self,
                       param: torch.nn.Parameter,
                       loaded_weight: torch.Tensor,
                       loaded_shard_id: Optional[str] = None):
    # just like all other parameters, does not yet
    # support loading bias with weight_loader_v2
    layer = (self.q_proj_decoder
             if loaded_shard_id == "q" else self.kv_proj_encoder)
    target_param = self.select_proj_params(layer, param)
    shard_id_args = (loaded_shard_id, ) if loaded_shard_id != "q" else ()
    layer.weight_loader(target_param, loaded_weight, *shard_id_args)

Typically, a parameter's weight loader is passed from its parent module. However, some models override the weight loader. This is a bad pattern, as it is not compatible with quantization. Unfortunately, rectifying each of these models is out of scope for this PR, so the simplest solution is to support mutation of the weight_loader property with a note to begin supporting Model.load_weights.

def weight_loader(self) -> Callable:
    # NOTE(@ksayers) some models such as mamba_mixer2 override the
    # weight loader to support custom loading. In the future, model-specific
    # weight loading should be implemented via Model.load_weights. In the
    # meantime, support deleting and overriding `weight_loader`` attribute
    if self._weight_loader is None:
        raise AttributeError(f"{self.__class__.__name__} weight_loader "
                             "attribute has been deleted")

@weight_loader.setter
def weight_loader(self, value: Callable):
    self._weight_loader = value

@weight_loader.deleter
def weight_loader(self):
    self._weight_loader = None

Testing

Tested meta-llama/Llama-3.1-8B-Instruct with TP ∈ {1, 2}
Tested Qwen/Qwen1.5-MoE-A2.7B with TP ∈ {1, 2}
CI testing

github-actions · 2025-08-16T21:24:12Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request modernizes UnquantizedLinearMethod by adding support for weight_loader_v2. This is achieved by using ModelWeightParameter during weight creation and then converting it back to a standard torch.nn.Parameter after loading to ensure torch.compile compatibility. The changes are logical, well-contained, and correctly enable the new functionality. The implementation appears solid and aligns with the project's existing patterns.

yewentao256

Let's run the CI and see the results

vllm/compilation/decorators.py

mergify · 2025-09-15T16:49:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-09-21T01:06:33Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @kylesayrs.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: gaojc <1055866782@qq.com>

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

kylesayrs mentioned this pull request Aug 16, 2025

[Transform] [Quantization] Add transforms to compressed tensors #22486

Merged

gemini-code-assist bot reviewed Aug 16, 2025

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 18, 2025

yewentao256 reviewed Aug 18, 2025

View reviewed changes

mgoin self-assigned this Aug 18, 2025

kylesayrs force-pushed the kylesayrs/unquantized-vllm-param branch from 46f1244 to 142e9e6 Compare August 19, 2025 20:45

mergify bot added the ci/build label Aug 19, 2025

kylesayrs force-pushed the kylesayrs/unquantized-vllm-param branch from 142e9e6 to 08368f2 Compare August 19, 2025 20:46

mergify bot removed the needs-rebase label Sep 3, 2025

kylesayrs commented Sep 15, 2025

View reviewed changes

vllm/compilation/decorators.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Sep 15, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 15, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 15, 2025

kylesayrs force-pushed the kylesayrs/unquantized-vllm-param branch from 7e75806 to 28e38df Compare September 17, 2025 08:16

mergify bot removed the needs-rebase label Sep 17, 2025

mergify bot added the needs-rebase label Sep 21, 2025

kylesayrs added 2 commits September 22, 2025 15:14

break out function, gate torch

087ba6e

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix rebase

68abefe

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/unquantized-vllm-param branch from 28e38df to 68abefe Compare September 22, 2025 19:17

mergify bot removed the needs-rebase label Sep 22, 2025

kylesayrs added 2 commits September 22, 2025 15:24

add docstring

803499f

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Merge branch 'main' into kylesayrs/unquantized-vllm-param

859bb61

mgoin approved these changes Sep 24, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 24, 2025

mgoin merged commit de94289 into vllm-project:main Sep 24, 2025
43 checks passed

github-project-automation bot moved this to Done in Tool Calling Sep 24, 2025

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 24, 2025

mgoin mentioned this pull request Sep 24, 2025

[Bug]: Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() #25604

Closed

1 task

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (vllm-p…

b763c38

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (#23036)

4ed6b67

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (vllm-p…

3c3dc9f

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: gaojc <1055866782@qq.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (vllm-p…

6bfd6c3

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (vllm-p…

47ac725

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs mentioned this pull request Oct 15, 2025

[Compressed Tensors] Remove parameter conversion for sparse24 #26947

Draft

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (vllm-p…

9227fec

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[Core] Support weight_loader_v2 for UnquantizedLinearMethod (vllm-p…

03a8288

…roject#23036) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` #23036

[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` #23036

Uh oh!

kylesayrs commented Aug 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

mergify bot commented Sep 15, 2025

Uh oh!

mergify bot commented Sep 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

[Core] Support weight_loader_v2 for UnquantizedLinearMethod #23036

[Core] Support weight_loader_v2 for UnquantizedLinearMethod #23036

Uh oh!

Conversation

kylesayrs commented Aug 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Caveats

Testing

Uh oh!

github-actions bot commented Aug 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Sep 15, 2025

Uh oh!

mergify bot commented Sep 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` #23036

[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` #23036

kylesayrs commented Aug 16, 2025 •

edited by github-actions bot

Loading