Qwen2.5-0.5B-Instruct 运行apply_lora.py失败 #3095

jfduma · 2024-11-20T07:41:03Z

开发机：ubuntu 20.04 mnn 3.0.0

模型 huggingface：Qwen2.5-0.5B-Instruct 和 Qwen2.5-0.5B-Instruct-GPTQ-Int8

导出 onnx 模型

$ python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export onnx --dst_path mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3

✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s]
⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s]
✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s]
✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s]
✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s]
✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]

导出 mnn 模型

$ mnn/build/MNNConvert --modelFile mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn --weightQuantBits 8 --weightQuantBlock 128 --weightQuantAsymmetric --saveExternalData --transformerFuse --allowCustomOp

The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 3
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
Fuse Attention as /Reshape_8_output_0
Fuse Attention as /Reshape_17_output_0
Fuse Attention as /Reshape_26_output_0
Fuse Attention as /Reshape_35_output_0
Fuse Attention as /Reshape_44_output_0
Fuse Attention as /Reshape_53_output_0
Fuse Attention as /Reshape_62_output_0
Fuse Attention as /Reshape_71_output_0
Fuse Attention as /Reshape_80_output_0
Fuse Attention as /Reshape_89_output_0
Fuse Attention as /Reshape_98_output_0
Fuse Attention as /Reshape_107_output_0
Fuse Attention as /Reshape_116_output_0
Fuse Attention as /Reshape_125_output_0
Fuse Attention as /Reshape_134_output_0
Fuse Attention as /Reshape_143_output_0
Fuse Attention as /Reshape_152_output_0
Fuse Attention as /Reshape_161_output_0
Fuse Attention as /Reshape_170_output_0
Fuse Attention as /Reshape_179_output_0
Fuse Attention as /Reshape_188_output_0
Fuse Attention as /Reshape_197_output_0
Fuse Attention as /Reshape_206_output_0
Fuse Attention as /Reshape_215_output_0
Remove past KV for presents
Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight
inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ]
outputTensors: [ logits, presents, ]
Converted Success!

转换 LoRA

$ python mnn/tools/script/apply_lora.py --base mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/base.json --lora /work/task_alpha/alpha_lora/checkpoint-800 --scale 2 --out mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/lora_alpha.json

Traceback (most recent call last):
File "/work/mnn/tools/script/apply_lora.py", line 156, in
main(args)
File "/work/mnn/tools/script/apply_lora.py", line 146, in main
base.apply(lora, args.out)
File "/work/mnn/tools/script/apply_lora.py", line 94, in apply
self.apply_lora(op, lora)
File "/work/mnn/tools/script/apply_lora.py", line 70, in apply_lora
tag = names[1].split('.')[1] + names[3]
IndexError: list index out of range

经调试：name = ['', 'mlp', 'gate_proj', 'FakeLinear_output_0__matmul_converted']

The text was updated successfully, but these errors were encountered:

jxt1234 · 2024-11-20T09:40:28Z

看着是没考虑名字带空格的情况，我们排查一下

jxt1234 added bug Something isn't working llm llm export error or use error labels Nov 20, 2024

jxt1234 mentioned this issue Nov 20, 2024

Qwen2.5-0.5B-Instruct 运行apply_gptq.py应用 GPTQ 参数失败 #3094

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5-0.5B-Instruct 运行apply_lora.py失败 #3095

Qwen2.5-0.5B-Instruct 运行apply_lora.py失败 #3095

jfduma commented Nov 20, 2024

jxt1234 commented Nov 20, 2024

Qwen2.5-0.5B-Instruct 运行apply_lora.py失败 #3095

Qwen2.5-0.5B-Instruct 运行apply_lora.py失败 #3095

Comments

jfduma commented Nov 20, 2024

导出 onnx 模型

导出 mnn 模型

转换 LoRA

jxt1234 commented Nov 20, 2024