Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]:请问llama.cpp部署中convert_hf_to_gguf时,为什么需要modify_tensors转换部分权重呢? #247

Open
FdyCN opened this issue Sep 25, 2024 · 1 comment

Comments

@FdyCN
Copy link

FdyCN commented Sep 25, 2024

在convert_hf_to_gguf.py文件中,转换MiniCPM模型的时候,如下类override了modify_tensors,并且只转换了q_proj.weight和k_proj.weight,请问为什么需要转换呢?或者如注释所说“HF models permute some of the tensors, so we need to undo that”,HF model是在那里做了这部分的permute呢?有点没搞清楚事情的原委。。求解答

@Model.register("MiniCPMForCausalLM")
class MiniCPMModel(Model):
    model_arch = gguf.MODEL_ARCH.MINICPM

    def set_gguf_parameters(self):
        block_count = self.hparams["num_hidden_layers"]
        self.gguf_writer.add_context_length(self.hparams["max_position_embeddings"])
        self.gguf_writer.add_embedding_length(self.hparams["hidden_size"])
        self.gguf_writer.add_block_count(block_count)
        self.gguf_writer.add_feed_forward_length(self.hparams["intermediate_size"])
        self.gguf_writer.add_rope_dimension_count(self.hparams["hidden_size"] // self.hparams["num_attention_heads"])
        self.gguf_writer.add_head_count(self.hparams["num_attention_heads"])
        self.gguf_writer.add_head_count_kv(self.hparams["num_key_value_heads"])
        self.gguf_writer.add_layer_norm_rms_eps(self.hparams["rms_norm_eps"])
        self.gguf_writer.add_file_type(self.ftype)

    def set_vocab(self):
        self._set_vocab_llama_hf()

    def _reverse_hf_permute(self, weights: Tensor, n_head: int, n_kv_head: int | None = None) -> Tensor:
        if n_kv_head is not None and n_head != n_kv_head:
            n_head //= n_kv_head

        return (
            weights.reshape(n_head, 2, weights.shape[0] // n_head // 2, *weights.shape[1:])
            .swapaxes(1, 2)
            .reshape(weights.shape)
        )

   # 这里为什么需要做permute呢???
    def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
        del bid  # unused

        n_head = self.hparams["num_attention_heads"]
        n_kv_head = self.hparams.get("num_key_value_heads")

        # HF models permute some of the tensors, so we need to undo that
        if name.endswith(("q_proj.weight")):
            data_torch = self._reverse_hf_permute(data_torch, n_head, n_head)
        if name.endswith(("k_proj.weight")):
            data_torch = self._reverse_hf_permute(data_torch, n_head, n_kv_head)

        return [(self.map_tensor_name(name), data_torch)]

我看了下原始的minicpm_modeling.py,也没看出来有啥不一样呢?

@FdyCN
Copy link
Author

FdyCN commented Sep 25, 2024

仔细看了下modeling_minicpm.py的逻辑,在默认使用"eager"模式的MiniCPMAttention中,当pretraining_tp>1的时候,q\k\v_proj.weight均会在N方向上做一个weights.reshape(b, 2, N/2, K).transpose(1,2).reshape(N,K)的一个重排。这个勉强能和上述modify_tensors的逻辑对应上。但还是有两个疑问:

  1. 默认的configuration_minicpm.py中可以看到,pretraining_tp = 1,所以应该不会出发weight重排的逻辑
  2. 如果pytorch weight重排了,那么在convert_hf_to_gguf.py中modify_tensors的逻辑不是应该q\k\v三分Proj.weight都做permute吗?为什么制作了q\k???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant