Skip to content

Commit 792df44

Browse files
ReinForce-IIyewentao256
authored andcommitted
[bugfix] remove unused parameters to reduce unnecessary vram usage (vllm-project#26789)
Signed-off-by: Reinforce-II <fate@eastal.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
1 parent b409d66 commit 792df44

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -307,10 +307,12 @@ def process_weights_after_loading(self, layer: torch.nn.Module) -> None:
307307
layer.w13_weight = torch.nn.Parameter(
308308
layer.w13_weight_packed.data, requires_grad=False
309309
)
310+
delattr(layer, "w13_weight_packed")
310311

311312
layer.w2_weight = torch.nn.Parameter(
312313
layer.w2_weight_packed.data, requires_grad=False
313314
)
315+
delattr(layer, "w2_weight_packed")
314316

315317
# reorder GEMM1 weights and block scales for FlashInfer CUTLASS kernel.
316318
if self.allow_flashinfer:

0 commit comments

Comments
 (0)