Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Darknessrky · 2024-11-12T09:13:04Z

经打印llama-3.2-3b模型参数：

Model Parameters and their Shapes:
model.embed_tokens.weight: torch.Size([128256, 3072])
model.layers.0.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.0.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.0.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.0.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.0.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.0.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.0.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.0.input_layernorm.weight: torch.Size([3072])
model.layers.0.post_attention_layernorm.weight: torch.Size([3072])
model.layers.1.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.1.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.1.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.1.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.1.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.1.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.1.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.1.input_layernorm.weight: torch.Size([3072])
model.layers.1.post_attention_layernorm.weight: torch.Size([3072])
model.layers.2.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.2.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.2.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.2.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.2.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.2.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.2.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.2.input_layernorm.weight: torch.Size([3072])
model.layers.2.post_attention_layernorm.weight: torch.Size([3072])
model.layers.3.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.3.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.3.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.3.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.3.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.3.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.3.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.3.input_layernorm.weight: torch.Size([3072])
model.layers.3.post_attention_layernorm.weight: torch.Size([3072])
model.layers.4.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.4.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.4.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.4.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.4.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.4.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.4.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.4.input_layernorm.weight: torch.Size([3072])
model.layers.4.post_attention_layernorm.weight: torch.Size([3072])
model.layers.5.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.5.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.5.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.5.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.5.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.5.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.5.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.5.input_layernorm.weight: torch.Size([3072])
model.layers.5.post_attention_layernorm.weight: torch.Size([3072])
model.layers.6.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.6.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.6.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.6.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.6.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.6.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.6.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.6.input_layernorm.weight: torch.Size([3072])
model.layers.6.post_attention_layernorm.weight: torch.Size([3072])
model.layers.7.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.7.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.7.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.7.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.7.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.7.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.7.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.7.input_layernorm.weight: torch.Size([3072])
model.layers.7.post_attention_layernorm.weight: torch.Size([3072])
model.layers.8.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.8.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.8.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.8.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.8.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.8.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.8.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.8.input_layernorm.weight: torch.Size([3072])
model.layers.8.post_attention_layernorm.weight: torch.Size([3072])
model.layers.9.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.9.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.9.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.9.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.9.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.9.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.9.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.9.input_layernorm.weight: torch.Size([3072])
model.layers.9.post_attention_layernorm.weight: torch.Size([3072])
model.layers.10.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.10.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.10.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.10.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.10.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.10.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.10.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.10.input_layernorm.weight: torch.Size([3072])
model.layers.10.post_attention_layernorm.weight: torch.Size([3072])
model.layers.11.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.11.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.11.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.11.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.11.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.11.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.11.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.11.input_layernorm.weight: torch.Size([3072])
model.layers.11.post_attention_layernorm.weight: torch.Size([3072])
model.layers.12.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.12.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.12.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.12.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.12.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.12.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.12.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.12.input_layernorm.weight: torch.Size([3072])
model.layers.12.post_attention_layernorm.weight: torch.Size([3072])
model.layers.13.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.13.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.13.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.13.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.13.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.13.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.13.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.13.input_layernorm.weight: torch.Size([3072])
model.layers.13.post_attention_layernorm.weight: torch.Size([3072])
model.layers.14.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.14.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.14.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.14.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.14.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.14.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.14.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.14.input_layernorm.weight: torch.Size([3072])
model.layers.14.post_attention_layernorm.weight: torch.Size([3072])
model.layers.15.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.15.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.15.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.15.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.15.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.15.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.15.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.15.input_layernorm.weight: torch.Size([3072])
model.layers.15.post_attention_layernorm.weight: torch.Size([3072])
model.layers.16.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.16.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.16.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.16.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.16.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.16.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.16.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.16.input_layernorm.weight: torch.Size([3072])
model.layers.16.post_attention_layernorm.weight: torch.Size([3072])
model.layers.17.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.17.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.17.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.17.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.17.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.17.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.17.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.17.input_layernorm.weight: torch.Size([3072])
model.layers.17.post_attention_layernorm.weight: torch.Size([3072])
model.layers.18.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.18.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.18.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.18.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.18.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.18.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.18.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.18.input_layernorm.weight: torch.Size([3072])
model.layers.18.post_attention_layernorm.weight: torch.Size([3072])
model.layers.19.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.19.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.19.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.19.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.19.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.19.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.19.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.19.input_layernorm.weight: torch.Size([3072])
model.layers.19.post_attention_layernorm.weight: torch.Size([3072])
model.layers.20.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.20.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.20.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.20.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.20.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.20.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.20.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.20.input_layernorm.weight: torch.Size([3072])
model.layers.20.post_attention_layernorm.weight: torch.Size([3072])
model.layers.21.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.21.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.21.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.21.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.21.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.21.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.21.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.21.input_layernorm.weight: torch.Size([3072])
model.layers.21.post_attention_layernorm.weight: torch.Size([3072])
model.layers.22.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.22.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.22.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.22.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.22.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.22.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.22.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.22.input_layernorm.weight: torch.Size([3072])
model.layers.22.post_attention_layernorm.weight: torch.Size([3072])
model.layers.23.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.23.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.23.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.23.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.23.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.23.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.23.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.23.input_layernorm.weight: torch.Size([3072])
model.layers.23.post_attention_layernorm.weight: torch.Size([3072])
model.layers.24.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.24.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.24.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.24.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.24.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.24.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.24.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.24.input_layernorm.weight: torch.Size([3072])
model.layers.24.post_attention_layernorm.weight: torch.Size([3072])
model.layers.25.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.25.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.25.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.25.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.25.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.25.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.25.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.25.input_layernorm.weight: torch.Size([3072])
model.layers.25.post_attention_layernorm.weight: torch.Size([3072])
model.layers.26.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.26.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.26.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.26.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.26.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.26.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.26.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.26.input_layernorm.weight: torch.Size([3072])
model.layers.26.post_attention_layernorm.weight: torch.Size([3072])
model.layers.27.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.27.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.27.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.27.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.27.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.27.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.27.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.27.input_layernorm.weight: torch.Size([3072])
model.layers.27.post_attention_layernorm.weight: torch.Size([3072])
model.norm.weight: torch.Size([3072])

并没有超参文件中lm_head_module: "lm_head"提到的lm_head模块，运行时产生报错

The text was updated successfully, but these errors were encountered:

JizhanFang · 2024-11-12T09:58:15Z

你好，我这边刚刚也打印了llama-3.2-3B的模型参数，最后的结果中是有lm_head，事实上作为unembedding矩阵，一个模型不太可能没有lm_head。

model.layers.27.self_attn.q_proj: torch.Size([3072, 3072])
model.layers.27.self_attn.k_proj: torch.Size([1024, 3072])
model.layers.27.self_attn.v_proj: torch.Size([1024, 3072])
model.layers.27.self_attn.o_proj: torch.Size([3072, 3072])
model.layers.27.mlp.gate_proj: torch.Size([8192, 3072])
model.layers.27.mlp.up_proj: torch.Size([8192, 3072])
model.layers.27.mlp.down_proj: torch.Size([3072, 8192])
model.layers.27.input_layernorm: torch.Size([3072])
model.layers.27.post_attention_layernorm: torch.Size([3072])
model.norm: torch.Size([3072])
lm_head: torch.Size([128256, 3072])

然后这是我使用的代码：

import torch
from transformers import AutoModelForCausalLM

model_name = "./Models/Llama-3.2-3B"  
model = AutoModelForCausalLM.from_pretrained(model_name)

for name, module in model.named_modules():
    if hasattr(module, 'weight') and module.weight is not None:
        print(f"{name}: {module.weight.size()}")

我个人推测可能是您环境中的transformers库版本的问题，因为llama-3.2系列需要版本为 4.44.2的transformers库支持。

Darknessrky · 2024-11-12T10:28:52Z

感谢回复，我发现了我的错误。我的问题在于我打印参数是基于参数的，代码如下：

import torch
from transformers import AutoModelForCausalLM  

model = AutoModelForCausalLM.from_pretrained("/data/renky/LLMs/models--meta-llama--Llama-3.2-3B")  
for name, param in model.named_parameters():
    print(f"{name}: {param.shape}")

运行您给的代码后得到了lm_head的信息。

然而代码报错问题尚未解决，以下是发生错误的代码：

import json
from easyeditor.editors.editor import BaseEditor
from easyeditor import MEMITHyperParams

# load zsre_data
edit_data = json.load(open('./data/ZsRE/ZsRE-test-all.json', 'r', encoding='utf-8'))[:100]
prompts = [edit_data_['prompt'] for edit_data_ in edit_data]
ground_truth = [edit_data_['ground_truth'][0] for edit_data_ in edit_data]  
subject = [edit_data_['subject'] for edit_data_ in edit_data]
target_new = [edit_data_['target_new'] for edit_data_ in edit_data]

# MEMIT
hparams=MEMITHyperParams.from_hparams('./hparams/MEMIT/llama3.2-3b.yaml'
editor = BaseEditor.from_hparams(hparams)
metrics, edited_model_false, _ = editor.edit(
    prompts=prompts,
    ground_truth=ground_truth,
    target_new=target_new,
    subject=subject,
    keep_original_weight=False
)
print(metrics)

超参文件信息：

alg_name: "MEMIT"
model_name: "./hugging_cache/llama3.2-3b"
stats_dir: "./data/stats"
device: 0
layers: [4, 5, 6, 7, 8]
clamp_norm_factor: 4
layer_selection: "all"
fact_token: "subject_last"
v_num_grad_steps: 25
v_lr: 5e-1
v_loss_layer: 27
v_weight_decay: 1e-3
kl_factor: 0.0625
mom2_adjustment: true
mom2_update_weight: 15000
rewrite_module_tmp: "model.layers.{}.mlp.down_proj"
layer_module_tmp: "model.layers.{}"
mlp_module_tmp: "model.layers.{}.mlp"
attn_module_tmp: "model.layers.{}.self_attn"
ln_f_module: "model.norm"
lm_head_module: "lm_head"
mom2_dataset: "wikipedia"
mom2_n_samples: 100000
mom2_dtype: "float32"
model_parallel: true

终端输出以及报错如下：

2024-11-12 18:27:20,759 - easyeditor.editors.editor - INFO - Instantiating model
11/12/2024 18:27:20 - INFO - easyeditor.editors.editor -   Instantiating model
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.44s/it]
2024-11-12 18:27:29,181 - easyeditor.editors.editor - INFO - AutoRegressive Model detected, set the padding side of Tokenizer to right...
11/12/2024 18:27:29 - INFO - easyeditor.editors.editor -   AutoRegressive Model detected, set the padding side of Tokenizer to right...
  0%|                                                                                                                                                                          | 0/100 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 17.51it/s]
  0%|                                                                                                                                                                          | 0/100 [00:00<?, ?it/s]MEMIT request sample: [Which family does Epaspidoceras belong to?] -> [ Noctuidae]
Cached context templates [['{}'], ['The 2018-2019 school year. {}', 'Therefore, it can be seen that the two. {}', 'Because the majority of the time, we don. {}', 'I am not a huge fan of the term. {}', 'You may also wish to search for items by. {}']]
  0%|                                                                                                                                                                          | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/renky/EasyEdit/test.py", line 21, in <module>
    metrics, edited_model_false, _ = editor.edit(
  File "/data/renky/EasyEdit/easyeditor/editors/editor.py", line 182, in edit
    return self.edit_requests(requests, sequential_edit, verbose, test_generation=test_generation, **kwargs)
  File "/data/renky/EasyEdit/easyeditor/editors/editor.py", line 370, in edit_requests
    edited_model, weights_copy, icl_examples = edit_func(request)
  File "/data/renky/EasyEdit/easyeditor/editors/editor.py", line 318, in edit_func
    edited_model, weights_copy = self.apply_algo(
  File "/data/renky/EasyEdit/easyeditor/models/memit/memit_main.py", line 46, in apply_memit_to_model
    deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
  File "/data/renky/EasyEdit/easyeditor/models/memit/memit_main.py", line 137, in execute_memit
    cur_z = compute_z(
  File "/data/renky/EasyEdit/easyeditor/models/memit/compute_z.py", line 28, in compute_z
    nethook.get_parameter(model, f"{hparams.lm_head_module}.weight").T,
  File "/data/renky/EasyEdit/easyeditor/util/nethook.py", line 372, in get_parameter
    raise LookupError(name)
LookupError: lm_head.weight

请问这个问题出在哪里了？

JizhanFang · 2024-11-12T13:47:17Z

这个地方确实是我们的问题，我们对于memit的llama-3.2-3B的支持没有做好，但这个问题已被修复，你只需将报错位置的EasyEdit/easyeditor/util/nethook.py中的def get_parameter(model, name):函数中改为以下代码：

def get_parameter(model, name):
    """
    Finds the named parameter within the given model.
    """
    for n, p in model.named_parameters():
        if n == name:
            return p
    if(name=="lm_head.weight"):
        lm_head_module = get_module(model, "lm_head")
        return lm_head_module.weight
    raise LookupError(name)

就可以正常运行了，因为llama-3.2-3b的model.named_parameters()确实没有包含lm_head部件，但是model.named_modules()中是包含了lm_head部件，所以只需用get_module函数取出lm_head，然后返回 lm_head_module.weight，这和直接利用model.named_parameters()取得的参数的weight是等价的。
这个小问题我们也会尽快在easyedit中修复，感谢您的反馈。

zxlzr added the question Further information is requested label Nov 12, 2024

JizhanFang closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Darknessrky commented Nov 12, 2024

JizhanFang commented Nov 12, 2024

Darknessrky commented Nov 12, 2024 •

edited

Loading

JizhanFang commented Nov 12, 2024

Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Comments

Darknessrky commented Nov 12, 2024

JizhanFang commented Nov 12, 2024

Darknessrky commented Nov 12, 2024 • edited Loading

JizhanFang commented Nov 12, 2024

Darknessrky commented Nov 12, 2024 •

edited

Loading