DeepSeek fix: awq x mergedreplicatedlinear #23764

mickaelseznec · 2025-08-27T18:34:15Z

Purpose

Test Plan

Check manually that DeepSeek AWQ output is correct

Test Result

vllm (pretrained=/models/DeepSeek-R1-AWQ,tensor_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9522	±	0.0059
		strict-match	5	exact_match	↑	0.9515	±	0.0059

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Mickael Seznec <mickael@mistral.ai>

gemini-code-assist

Code Review

This pull request aims to fix an issue with AWQ quantization in MergedReplicatedLinear layers by refactoring the weight loading to use a dedicated method, load_merged_column_weight. This is a good approach for handling specialized logic in custom parameter classes. However, the current implementation introduces a critical regression for unquantized models. It unconditionally calls param.load_merged_column_weight, but for unquantized layers, the parameter is a standard torch.nn.Parameter which lacks this method, and will cause an AttributeError during model loading. A check on the parameter type is needed to maintain backward compatibility for unquantized models.

vllm/model_executor/layers/linear.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Mickaël Seznec <mickael.seznec@gmail.com>

ashgold · 2025-08-28T13:32:39Z

It would be great if you could assign a reviewer to get it applied quickly!

cjackal

I have also checked that this PR works for awq and fp8, thanks for the fix.

BTW we may not need the isinstance(param, PerTensorScaleParameter) check now?

cjackal · 2025-08-28T16:16:20Z

vllm/model_executor/layers/linear.py

-        param.data[shard_offset:shard_offset + shard_size] = loaded_weight
+        if isinstance(param, BasevLLMParameter):
+            param.load_merged_column_weight(loaded_weight=loaded_weight,
+                                            shard_id=loaded_shard_id,
+                                            shard_offset=shard_offset,
+                                            shard_size=shard_size,
+                                            tp_rank=0)


While I do think calling the kwarg tp_rank is misleading, it does the trick and I can't come up with a better naming.

cjackal · 2025-08-28T23:31:24Z

@mickaelseznec We can ping the corresponding reviewers by prepending DeepSeek to the title FYI.

Signed-off-by: Mickael Seznec <mickael@mistral.ai>

mickaelseznec · 2025-08-29T09:43:23Z

Don't know who would be best to review, @mgoin @robertgshaw2-redhat because it's quantization related? Feel free to dispatch to others as well :)

cjackal · 2025-08-30T09:16:50Z

@tlrmchlsmth @yewentao256 This PR fixes weight loading logic of DeepSeek V2/V3 AWQ quantized models which is an oversight from the fused MLA qkv kernel update #21116.

mgoin · 2025-08-30T09:20:29Z

vllm/model_executor/layers/linear.py

-        elif isinstance(param, PerTensorScaleParameter):
-            shard_offset = loaded_shard_id
-            shard_size = 1


Why remove this case?

PerTensorScaleParameter is a subclass of BasevLLMParameter, and PerTensorScaleParameter.load_merged_column_weight is doing the right weight loading, so I reckon it a code quality improvement.

TLDR: We have previously set shard_offset and shard_size to something not what the name represents just to make the following weight overloading part(param.data[shard_offset:shard_offset + shard_size] = loaded_weight) working, but PerTensorScaleParameter.load_merged_column_weight which is just BasevLLMParameter._assert_and_load is doing exactly the same.

cjackal · 2025-09-06T18:15:39Z

Can we get this PR reviewed and merged in 0.10.2? The content of the fix is not too complicated, if one needs e2e validation (beyond what is already given in the PR body) for the review I can help.

qdivan · 2025-09-08T06:09:00Z

LGTM

Bulc · 2025-09-11T07:41:53Z

Hello, I would like to know which version of vllm this PR will be merged into.

xjpang · 2025-09-11T09:54:57Z

Hello, I would like to know which version of vllm this PR will be merged into.

+1

mickaelseznec · 2025-09-15T14:20:12Z

Fix should already be there with #23024

fix: awq x mergedreplicatedlinear

ffec0da

Signed-off-by: Mickael Seznec <mickael@mistral.ai>

mickaelseznec mentioned this pull request Aug 27, 2025

[Bug]: DeepSeek-R1 AWQ model loading is not possible in v0.10.0 or later. #23530

Open

1 task

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

vllm/model_executor/layers/linear.py Outdated Show resolved Hide resolved

nice catch

0a4568e

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Mickaël Seznec <mickael.seznec@gmail.com>

cjackal approved these changes Aug 28, 2025

View reviewed changes

mickaelseznec changed the title ~~fix: awq x mergedreplicatedlinear~~ DeepSeek fix: awq x mergedreplicatedlinear Aug 29, 2025

mergify bot added the deepseek Related to DeepSeek models label Aug 29, 2025

nit: no need for specific PerTensorScale path

f6bded1

Signed-off-by: Mickael Seznec <mickael@mistral.ai>

mgoin reviewed Aug 30, 2025

View reviewed changes

mickaelseznec closed this Sep 15, 2025

Uh oh!

DeepSeek fix: awq x mergedreplicatedlinear #23764

DeepSeek fix: awq x mergedreplicatedlinear #23764

Uh oh!

Conversation

mickaelseznec commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ashgold commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjackal left a comment

Choose a reason for hiding this comment

Uh oh!

cjackal Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal commented Aug 28, 2025

Uh oh!

mickaelseznec commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cjackal commented Aug 30, 2025

Uh oh!

mgoin Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal commented Sep 6, 2025

Uh oh!

qdivan commented Sep 8, 2025

Uh oh!

Bulc commented Sep 11, 2025

Uh oh!

xjpang commented Sep 11, 2025

Uh oh!

mickaelseznec commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

mickaelseznec commented Aug 27, 2025 •

edited by github-actions bot

Loading

ashgold commented Aug 28, 2025 •

edited

Loading

mickaelseznec commented Aug 29, 2025 •

edited

Loading