Skip to content

[CI Failure]: deepseek_mtp_main_random_bf16 can't load, causes deepseek_mtp CI Failure. #20158

@umeiko

Description

@umeiko

Name of failing test

https://github.com/vllm-project/vllm-ascend/actions/runs/15890661413/job/44812465270?pr=1128

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

Acsend 910b platform

pytest ./tests/e2e/long_term/spec_decode_v0/e2e/test_mtp_correctness.py

📝 History of failing test

vllm-empty/vllm/model_executor/model_loader/default_loader.py:269: in load_weights
    loaded_weights = model.load_weights(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = CustomDeepSeekMTP(
  (model): CustomDeepSeekMultiTokenPredictor(
    (layers): ModuleDict(
      (5): CustomDeepSeekMu...ogitsProcessor(vocab_size=129280, org_vocab_size=129280, scale=1.0, logits_as_input=False)
  )
  (sampler): Sampler()
)
weights = <generator object DefaultModelLoader.get_all_weights at 0xffff7dd9f060>

    def load_weights(self, weights: Iterable[tuple[str,
                                                   torch.Tensor]]) -> set[str]:
        stacked_params_mapping = [
            ("gate_up_proj", "gate_proj", 0),
            ("gate_up_proj", "up_proj", 1),
        ]
    
        expert_params_mapping = FusedMoE.make_expert_params_mapping(
            ckpt_gate_proj_name="gate_proj",
            ckpt_down_proj_name="down_proj",
            ckpt_up_proj_name="up_proj",
            num_experts=self.config.n_routed_experts)
    
        params_dict = dict(self.named_parameters())
        loaded_params: set[str] = set()
        for name, loaded_weight in weights:
            if "rotary_emb.inv_freq" in name:
                continue
            spec_layer = get_spec_layer_idx_from_weight_name(self.config, name)
            if spec_layer is None:
                continue
            name = self._rewrite_spec_layer_name(spec_layer, name)
            for (param_name, weight_name, shard_id) in stacked_params_mapping:
                # Skip non-stacked layers and experts (experts handled below).
                if weight_name not in name:
                    continue
                # We have mlp.experts[0].gate_proj in the checkpoint.
                # Since we handle the experts below in expert_params_mapping,
                # we need to skip here BEFORE we update the name, otherwise
                # name will be updated to mlp.experts[0].gate_up_proj, which
                # will then be updated below in expert_params_mapping
                # for mlp.experts[0].gate_gate_up_proj, which breaks load.
                if (("mlp.experts." in name) and name not in params_dict):
                    continue
                name = name.replace(weight_name, param_name)
                # Skip loading extra bias for GPTQ models.
                if name.endswith(".bias") and name not in params_dict:
                    continue
    
                param = params_dict[name]
                weight_loader = param.weight_loader
                weight_loader(param, loaded_weight, shard_id)
                break
            else:
                for mapping in expert_params_mapping:
                    param_name, weight_name, expert_id, shard_id = mapping
                    if weight_name not in name:
                        continue
                    name = name.replace(weight_name, param_name)
    
                    param = params_dict[name]
                    weight_loader = param.weight_loader
                    weight_loader(param,
                                  loaded_weight,
                                  name,
                                  shard_id=shard_id,
                                  expert_id=expert_id)
                    break
                else:
                    # Skip loading extra bias for GPTQ models.
                    if name.endswith(".bias") and name not in params_dict:
                        continue
    
                    # According to DeepSeek-V3 Technical Report, MTP modules
                    # shares embedding layer. We only load the first weights.
                    if (spec_layer != self.model.mtp_start_layer_idx
                            and ".layers" not in name):
                        continue
    
>                   param = params_dict[name]
E                   KeyError: 'model.embed_tokens.weight'

CC List.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CIstaleOver 90 days of inactivity

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions