You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s]
⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s]
✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s]
✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s]
✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s]
✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 3
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
Fuse Attention as /Reshape_8_output_0
Fuse Attention as /Reshape_17_output_0
Fuse Attention as /Reshape_26_output_0
Fuse Attention as /Reshape_35_output_0
Fuse Attention as /Reshape_44_output_0
Fuse Attention as /Reshape_53_output_0
Fuse Attention as /Reshape_62_output_0
Fuse Attention as /Reshape_71_output_0
Fuse Attention as /Reshape_80_output_0
Fuse Attention as /Reshape_89_output_0
Fuse Attention as /Reshape_98_output_0
Fuse Attention as /Reshape_107_output_0
Fuse Attention as /Reshape_116_output_0
Fuse Attention as /Reshape_125_output_0
Fuse Attention as /Reshape_134_output_0
Fuse Attention as /Reshape_143_output_0
Fuse Attention as /Reshape_152_output_0
Fuse Attention as /Reshape_161_output_0
Fuse Attention as /Reshape_170_output_0
Fuse Attention as /Reshape_179_output_0
Fuse Attention as /Reshape_188_output_0
Fuse Attention as /Reshape_197_output_0
Fuse Attention as /Reshape_206_output_0
Fuse Attention as /Reshape_215_output_0
Remove past KV for presents
Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight
inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ]
outputTensors: [ logits, presents, ]
Converted Success!
Traceback (most recent call last):
File "/work/mnn/tools/script/apply_lora.py", line 156, in
main(args)
File "/work/mnn/tools/script/apply_lora.py", line 146, in main
base.apply(lora, args.out)
File "/work/mnn/tools/script/apply_lora.py", line 94, in apply
self.apply_lora(op, lora)
File "/work/mnn/tools/script/apply_lora.py", line 70, in apply_lora
tag = names[1].split('.')[1] + names[3]
IndexError: list index out of range
开发机:ubuntu 20.04 mnn 3.0.0
模型 huggingface:Qwen2.5-0.5B-Instruct 和 Qwen2.5-0.5B-Instruct-GPTQ-Int8
导出 onnx 模型
$ python mnn/transformers/llm/export/llmexport.py --path pretrained_model/Qwen2.5-0.5B-Instruct --export onnx --dst_path mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3
✅ Done load pretrained model pretrained_model/Qwen2.5-0.5B-Instruct [ 1.10 s]
⠋ export tokenizer to 2024-11-20 15:21:53.270750: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable
TF_ENABLE_ONEDNN_OPTS=0
.2024-11-20 15:21:53.285959: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1732087313.300938 1727776 cuda_dnn.cc:8322] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1732087313.305363 1727776 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-11-20 15:21:53.322212: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
✅ Done export tokenizer to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/tokenizer.txt[ 2.71 s]
✅ Done export embedding to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/embeddings_bf16.bin[ 0.12 s]
✅ Done export onnx model to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx[ 3.43 s]
✅ Done export model weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx.data[ 3.19 s]
✅ Done export config to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm_config.json[ 0.00 s]
导出 mnn 模型
$ mnn/build/MNNConvert --modelFile mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/onnx/llm.onnx --framework ONNX --MNNModel mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn --weightQuantBits 8 --weightQuantBlock 128 --weightQuantAsymmetric --saveExternalData --transformerFuse --allowCustomOp
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
Don't has bizCode, use MNNTest for default
Start to Convert Other Model Format To MNN Model..., target version: 3
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:46: ONNX Model ir version: 8
[15:22:06] /work/mnn/tools/converter/source/onnx/onnxConverter.cpp:47: ONNX Model opset version: 15
Start to Optimize the MNN Net...
Fuse Attention as /Reshape_8_output_0
Fuse Attention as /Reshape_17_output_0
Fuse Attention as /Reshape_26_output_0
Fuse Attention as /Reshape_35_output_0
Fuse Attention as /Reshape_44_output_0
Fuse Attention as /Reshape_53_output_0
Fuse Attention as /Reshape_62_output_0
Fuse Attention as /Reshape_71_output_0
Fuse Attention as /Reshape_80_output_0
Fuse Attention as /Reshape_89_output_0
Fuse Attention as /Reshape_98_output_0
Fuse Attention as /Reshape_107_output_0
Fuse Attention as /Reshape_116_output_0
Fuse Attention as /Reshape_125_output_0
Fuse Attention as /Reshape_134_output_0
Fuse Attention as /Reshape_143_output_0
Fuse Attention as /Reshape_152_output_0
Fuse Attention as /Reshape_161_output_0
Fuse Attention as /Reshape_170_output_0
Fuse Attention as /Reshape_179_output_0
Fuse Attention as /Reshape_188_output_0
Fuse Attention as /Reshape_197_output_0
Fuse Attention as /Reshape_206_output_0
Fuse Attention as /Reshape_215_output_0
Remove past KV for presents
Save Weight to mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/llm.mnn.weight
inputTensors : [ input_ids, position_ids, attention_mask, past_key_values, ]
outputTensors: [ logits, presents, ]
Converted Success!
转换 LoRA
$ python mnn/tools/script/apply_lora.py --base mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/base.json --lora /work/task_alpha/alpha_lora/checkpoint-800 --scale 2 --out mnn-output/basemodel_0.5b_instruct_q88_gptq_onnx_mnn_v3/lora_alpha.json
Traceback (most recent call last):
File "/work/mnn/tools/script/apply_lora.py", line 156, in
main(args)
File "/work/mnn/tools/script/apply_lora.py", line 146, in main
base.apply(lora, args.out)
File "/work/mnn/tools/script/apply_lora.py", line 94, in apply
self.apply_lora(op, lora)
File "/work/mnn/tools/script/apply_lora.py", line 70, in apply_lora
tag = names[1].split('.')[1] + names[3]
IndexError: list index out of range
经调试:name = ['', 'mlp', 'gate_proj', 'FakeLinear_output_0__matmul_converted']
The text was updated successfully, but these errors were encountered: