Skip to content

Conversation

mickaelseznec
Copy link
Contributor

@mickaelseznec mickaelseznec commented Aug 27, 2025

Purpose

Fixing #23530

Test Plan

Check manually that DeepSeek AWQ output is correct

Test Result

vllm (pretrained=/models/DeepSeek-R1-AWQ,tensor_parallel_size=8,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9522 ± 0.0059
strict-match 5 exact_match 0.9515 ± 0.0059

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix an issue with AWQ quantization in MergedReplicatedLinear layers by refactoring the weight loading to use a dedicated method, load_merged_column_weight. This is a good approach for handling specialized logic in custom parameter classes. However, the current implementation introduces a critical regression for unquantized models. It unconditionally calls param.load_merged_column_weight, but for unquantized layers, the parameter is a standard torch.nn.Parameter which lacks this method, and will cause an AttributeError during model loading. A check on the parameter type is needed to maintain backward compatibility for unquantized models.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Mickaël Seznec <mickael.seznec@gmail.com>
@ashgold
Copy link

ashgold commented Aug 28, 2025

It would be great if you could assign a reviewer to get it applied quickly!

Copy link
Contributor

@cjackal cjackal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also checked that this PR works for awq and fp8, thanks for the fix.

BTW we may not need the isinstance(param, PerTensorScaleParameter) check now?

Comment on lines -441 to +446
param.data[shard_offset:shard_offset + shard_size] = loaded_weight
if isinstance(param, BasevLLMParameter):
param.load_merged_column_weight(loaded_weight=loaded_weight,
shard_id=loaded_shard_id,
shard_offset=shard_offset,
shard_size=shard_size,
tp_rank=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do think calling the kwarg tp_rank is misleading, it does the trick and I can't come up with a better naming.

@cjackal
Copy link
Contributor

cjackal commented Aug 28, 2025

@mickaelseznec We can ping the corresponding reviewers by prepending DeepSeek to the title FYI.

@mickaelseznec mickaelseznec changed the title fix: awq x mergedreplicatedlinear DeepSeek fix: awq x mergedreplicatedlinear Aug 29, 2025
@mergify mergify bot added the deepseek Related to DeepSeek models label Aug 29, 2025
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
@mickaelseznec
Copy link
Contributor Author

mickaelseznec commented Aug 29, 2025

Don't know who would be best to review, @mgoin @robertgshaw2-redhat because it's quantization related? Feel free to dispatch to others as well :)

@cjackal
Copy link
Contributor

cjackal commented Aug 30, 2025

@tlrmchlsmth @yewentao256 This PR fixes weight loading logic of DeepSeek V2/V3 AWQ quantized models which is an oversight from the fused MLA qkv kernel update #21116.

Comment on lines -434 to -436
elif isinstance(param, PerTensorScaleParameter):
shard_offset = loaded_shard_id
shard_size = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PerTensorScaleParameter is a subclass of BasevLLMParameter, and PerTensorScaleParameter.load_merged_column_weight is doing the right weight loading, so I reckon it a code quality improvement.

TLDR: We have previously set shard_offset and shard_size to something not what the name represents just to make the following weight overloading part(param.data[shard_offset:shard_offset + shard_size] = loaded_weight) working, but PerTensorScaleParameter.load_merged_column_weight which is just BasevLLMParameter._assert_and_load is doing exactly the same.

@cjackal
Copy link
Contributor

cjackal commented Sep 6, 2025

Can we get this PR reviewed and merged in 0.10.2? The content of the fix is not too complicated, if one needs e2e validation (beyond what is already given in the PR body) for the review I can help.

@qdivan
Copy link

qdivan commented Sep 8, 2025

LGTM

@Bulc
Copy link

Bulc commented Sep 11, 2025

Hello, I would like to know which version of vllm this PR will be merged into.

@xjpang
Copy link

xjpang commented Sep 11, 2025

Hello, I would like to know which version of vllm this PR will be merged into.

+1

@mickaelseznec
Copy link
Contributor Author

Fix should already be there with #23024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants