Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Closed
Darknessrky opened this issue Nov 12, 2024 · 3 comments
Closed

Question about /hparams/MEMIT/llama3.2-3b.yaml #418

Darknessrky opened this issue Nov 12, 2024 · 3 comments
Labels
question Further information is requested

Comments

@Darknessrky
Copy link

经打印llama-3.2-3b模型参数:

Model Parameters and their Shapes:
model.embed_tokens.weight: torch.Size([128256, 3072])
model.layers.0.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.0.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.0.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.0.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.0.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.0.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.0.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.0.input_layernorm.weight: torch.Size([3072])
model.layers.0.post_attention_layernorm.weight: torch.Size([3072])
model.layers.1.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.1.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.1.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.1.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.1.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.1.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.1.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.1.input_layernorm.weight: torch.Size([3072])
model.layers.1.post_attention_layernorm.weight: torch.Size([3072])
model.layers.2.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.2.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.2.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.2.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.2.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.2.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.2.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.2.input_layernorm.weight: torch.Size([3072])
model.layers.2.post_attention_layernorm.weight: torch.Size([3072])
model.layers.3.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.3.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.3.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.3.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.3.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.3.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.3.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.3.input_layernorm.weight: torch.Size([3072])
model.layers.3.post_attention_layernorm.weight: torch.Size([3072])
model.layers.4.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.4.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.4.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.4.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.4.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.4.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.4.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.4.input_layernorm.weight: torch.Size([3072])
model.layers.4.post_attention_layernorm.weight: torch.Size([3072])
model.layers.5.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.5.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.5.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.5.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.5.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.5.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.5.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.5.input_layernorm.weight: torch.Size([3072])
model.layers.5.post_attention_layernorm.weight: torch.Size([3072])
model.layers.6.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.6.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.6.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.6.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.6.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.6.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.6.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.6.input_layernorm.weight: torch.Size([3072])
model.layers.6.post_attention_layernorm.weight: torch.Size([3072])
model.layers.7.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.7.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.7.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.7.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.7.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.7.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.7.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.7.input_layernorm.weight: torch.Size([3072])
model.layers.7.post_attention_layernorm.weight: torch.Size([3072])
model.layers.8.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.8.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.8.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.8.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.8.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.8.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.8.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.8.input_layernorm.weight: torch.Size([3072])
model.layers.8.post_attention_layernorm.weight: torch.Size([3072])
model.layers.9.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.9.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.9.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.9.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.9.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.9.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.9.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.9.input_layernorm.weight: torch.Size([3072])
model.layers.9.post_attention_layernorm.weight: torch.Size([3072])
model.layers.10.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.10.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.10.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.10.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.10.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.10.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.10.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.10.input_layernorm.weight: torch.Size([3072])
model.layers.10.post_attention_layernorm.weight: torch.Size([3072])
model.layers.11.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.11.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.11.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.11.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.11.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.11.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.11.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.11.input_layernorm.weight: torch.Size([3072])
model.layers.11.post_attention_layernorm.weight: torch.Size([3072])
model.layers.12.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.12.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.12.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.12.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.12.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.12.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.12.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.12.input_layernorm.weight: torch.Size([3072])
model.layers.12.post_attention_layernorm.weight: torch.Size([3072])
model.layers.13.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.13.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.13.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.13.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.13.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.13.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.13.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.13.input_layernorm.weight: torch.Size([3072])
model.layers.13.post_attention_layernorm.weight: torch.Size([3072])
model.layers.14.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.14.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.14.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.14.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.14.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.14.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.14.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.14.input_layernorm.weight: torch.Size([3072])
model.layers.14.post_attention_layernorm.weight: torch.Size([3072])
model.layers.15.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.15.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.15.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.15.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.15.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.15.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.15.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.15.input_layernorm.weight: torch.Size([3072])
model.layers.15.post_attention_layernorm.weight: torch.Size([3072])
model.layers.16.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.16.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.16.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.16.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.16.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.16.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.16.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.16.input_layernorm.weight: torch.Size([3072])
model.layers.16.post_attention_layernorm.weight: torch.Size([3072])
model.layers.17.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.17.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.17.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.17.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.17.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.17.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.17.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.17.input_layernorm.weight: torch.Size([3072])
model.layers.17.post_attention_layernorm.weight: torch.Size([3072])
model.layers.18.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.18.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.18.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.18.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.18.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.18.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.18.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.18.input_layernorm.weight: torch.Size([3072])
model.layers.18.post_attention_layernorm.weight: torch.Size([3072])
model.layers.19.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.19.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.19.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.19.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.19.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.19.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.19.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.19.input_layernorm.weight: torch.Size([3072])
model.layers.19.post_attention_layernorm.weight: torch.Size([3072])
model.layers.20.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.20.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.20.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.20.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.20.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.20.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.20.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.20.input_layernorm.weight: torch.Size([3072])
model.layers.20.post_attention_layernorm.weight: torch.Size([3072])
model.layers.21.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.21.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.21.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.21.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.21.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.21.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.21.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.21.input_layernorm.weight: torch.Size([3072])
model.layers.21.post_attention_layernorm.weight: torch.Size([3072])
model.layers.22.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.22.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.22.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.22.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.22.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.22.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.22.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.22.input_layernorm.weight: torch.Size([3072])
model.layers.22.post_attention_layernorm.weight: torch.Size([3072])
model.layers.23.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.23.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.23.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.23.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.23.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.23.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.23.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.23.input_layernorm.weight: torch.Size([3072])
model.layers.23.post_attention_layernorm.weight: torch.Size([3072])
model.layers.24.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.24.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.24.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.24.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.24.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.24.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.24.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.24.input_layernorm.weight: torch.Size([3072])
model.layers.24.post_attention_layernorm.weight: torch.Size([3072])
model.layers.25.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.25.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.25.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.25.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.25.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.25.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.25.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.25.input_layernorm.weight: torch.Size([3072])
model.layers.25.post_attention_layernorm.weight: torch.Size([3072])
model.layers.26.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.26.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.26.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.26.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.26.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.26.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.26.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.26.input_layernorm.weight: torch.Size([3072])
model.layers.26.post_attention_layernorm.weight: torch.Size([3072])
model.layers.27.self_attn.q_proj.weight: torch.Size([3072, 3072])
model.layers.27.self_attn.k_proj.weight: torch.Size([1024, 3072])
model.layers.27.self_attn.v_proj.weight: torch.Size([1024, 3072])
model.layers.27.self_attn.o_proj.weight: torch.Size([3072, 3072])
model.layers.27.mlp.gate_proj.weight: torch.Size([8192, 3072])
model.layers.27.mlp.up_proj.weight: torch.Size([8192, 3072])
model.layers.27.mlp.down_proj.weight: torch.Size([3072, 8192])
model.layers.27.input_layernorm.weight: torch.Size([3072])
model.layers.27.post_attention_layernorm.weight: torch.Size([3072])
model.norm.weight: torch.Size([3072])

并没有超参文件中lm_head_module: "lm_head"提到的lm_head模块,运行时产生报错

@JizhanFang
Copy link
Collaborator

你好,我这边刚刚也打印了llama-3.2-3B的模型参数,最后的结果中是有lm_head,事实上作为unembedding矩阵,一个模型不太可能没有lm_head。

model.layers.27.self_attn.q_proj: torch.Size([3072, 3072])
model.layers.27.self_attn.k_proj: torch.Size([1024, 3072])
model.layers.27.self_attn.v_proj: torch.Size([1024, 3072])
model.layers.27.self_attn.o_proj: torch.Size([3072, 3072])
model.layers.27.mlp.gate_proj: torch.Size([8192, 3072])
model.layers.27.mlp.up_proj: torch.Size([8192, 3072])
model.layers.27.mlp.down_proj: torch.Size([3072, 8192])
model.layers.27.input_layernorm: torch.Size([3072])
model.layers.27.post_attention_layernorm: torch.Size([3072])
model.norm: torch.Size([3072])
lm_head: torch.Size([128256, 3072])

然后这是我使用的代码:

import torch
from transformers import AutoModelForCausalLM

model_name = "./Models/Llama-3.2-3B"  
model = AutoModelForCausalLM.from_pretrained(model_name)

for name, module in model.named_modules():
    if hasattr(module, 'weight') and module.weight is not None:
        print(f"{name}: {module.weight.size()}")

我个人推测可能是您环境中的transformers库版本的问题,因为llama-3.2系列需要版本为 4.44.2的transformers库支持。

@Darknessrky
Copy link
Author

Darknessrky commented Nov 12, 2024

感谢回复,我发现了我的错误。我的问题在于我打印参数是基于参数的,代码如下:

import torch
from transformers import AutoModelForCausalLM  

model = AutoModelForCausalLM.from_pretrained("/data/renky/LLMs/models--meta-llama--Llama-3.2-3B")  
for name, param in model.named_parameters():
    print(f"{name}: {param.shape}")

运行您给的代码后得到了lm_head的信息。

然而代码报错问题尚未解决,以下是发生错误的代码:

import json
from easyeditor.editors.editor import BaseEditor
from easyeditor import MEMITHyperParams

# load zsre_data
edit_data = json.load(open('./data/ZsRE/ZsRE-test-all.json', 'r', encoding='utf-8'))[:100]
prompts = [edit_data_['prompt'] for edit_data_ in edit_data]
ground_truth = [edit_data_['ground_truth'][0] for edit_data_ in edit_data]  
subject = [edit_data_['subject'] for edit_data_ in edit_data]
target_new = [edit_data_['target_new'] for edit_data_ in edit_data]

# MEMIT
hparams=MEMITHyperParams.from_hparams('./hparams/MEMIT/llama3.2-3b.yaml'
editor = BaseEditor.from_hparams(hparams)
metrics, edited_model_false, _ = editor.edit(
    prompts=prompts,
    ground_truth=ground_truth,
    target_new=target_new,
    subject=subject,
    keep_original_weight=False
)
print(metrics)

超参文件信息:

alg_name: "MEMIT"
model_name: "./hugging_cache/llama3.2-3b"
stats_dir: "./data/stats"
device: 0
layers: [4, 5, 6, 7, 8]
clamp_norm_factor: 4
layer_selection: "all"
fact_token: "subject_last"
v_num_grad_steps: 25
v_lr: 5e-1
v_loss_layer: 27
v_weight_decay: 1e-3
kl_factor: 0.0625
mom2_adjustment: true
mom2_update_weight: 15000
rewrite_module_tmp: "model.layers.{}.mlp.down_proj"
layer_module_tmp: "model.layers.{}"
mlp_module_tmp: "model.layers.{}.mlp"
attn_module_tmp: "model.layers.{}.self_attn"
ln_f_module: "model.norm"
lm_head_module: "lm_head"
mom2_dataset: "wikipedia"
mom2_n_samples: 100000
mom2_dtype: "float32"
model_parallel: true

终端输出以及报错如下:

2024-11-12 18:27:20,759 - easyeditor.editors.editor - INFO - Instantiating model
11/12/2024 18:27:20 - INFO - easyeditor.editors.editor -   Instantiating model
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00,  3.44s/it]
2024-11-12 18:27:29,181 - easyeditor.editors.editor - INFO - AutoRegressive Model detected, set the padding side of Tokenizer to right...
11/12/2024 18:27:29 - INFO - easyeditor.editors.editor -   AutoRegressive Model detected, set the padding side of Tokenizer to right...
  0%|                                                                                                                                                                          | 0/100 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 17.51it/s]
  0%|                                                                                                                                                                          | 0/100 [00:00<?, ?it/s]MEMIT request sample: [Which family does Epaspidoceras belong to?] -> [ Noctuidae]
Cached context templates [['{}'], ['The 2018-2019 school year. {}', 'Therefore, it can be seen that the two. {}', 'Because the majority of the time, we don. {}', 'I am not a huge fan of the term. {}', 'You may also wish to search for items by. {}']]
  0%|                                                                                                                                                                          | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/renky/EasyEdit/test.py", line 21, in <module>
    metrics, edited_model_false, _ = editor.edit(
  File "/data/renky/EasyEdit/easyeditor/editors/editor.py", line 182, in edit
    return self.edit_requests(requests, sequential_edit, verbose, test_generation=test_generation, **kwargs)
  File "/data/renky/EasyEdit/easyeditor/editors/editor.py", line 370, in edit_requests
    edited_model, weights_copy, icl_examples = edit_func(request)
  File "/data/renky/EasyEdit/easyeditor/editors/editor.py", line 318, in edit_func
    edited_model, weights_copy = self.apply_algo(
  File "/data/renky/EasyEdit/easyeditor/models/memit/memit_main.py", line 46, in apply_memit_to_model
    deltas = execute_memit(model, tok, requests, hparams, cache_template=cache_template)
  File "/data/renky/EasyEdit/easyeditor/models/memit/memit_main.py", line 137, in execute_memit
    cur_z = compute_z(
  File "/data/renky/EasyEdit/easyeditor/models/memit/compute_z.py", line 28, in compute_z
    nethook.get_parameter(model, f"{hparams.lm_head_module}.weight").T,
  File "/data/renky/EasyEdit/easyeditor/util/nethook.py", line 372, in get_parameter
    raise LookupError(name)
LookupError: lm_head.weight

请问这个问题出在哪里了?

@JizhanFang
Copy link
Collaborator

这个地方确实是我们的问题,我们对于memit的llama-3.2-3B的支持没有做好,但这个问题已被修复,你只需将报错位置的EasyEdit/easyeditor/util/nethook.py中的def get_parameter(model, name):函数中改为以下代码:

def get_parameter(model, name):
    """
    Finds the named parameter within the given model.
    """
    for n, p in model.named_parameters():
        if n == name:
            return p
    if(name=="lm_head.weight"):
        lm_head_module = get_module(model, "lm_head")
        return lm_head_module.weight
    raise LookupError(name)

就可以正常运行了,因为llama-3.2-3b的model.named_parameters()确实没有包含lm_head部件,但是model.named_modules()中是包含了lm_head部件,所以只需用get_module函数取出lm_head,然后返回 lm_head_module.weight,这和直接利用model.named_parameters()取得的参数的weight是等价的。
这个小问题我们也会尽快在easyedit中修复,感谢您的反馈。

@zxlzr zxlzr added the question Further information is requested label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants