Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on OPPO FindX7 Ultra (Snapdragon8Gen3) #117

Open
bingo787 opened this issue Aug 14, 2024 · 3 comments
Open

Segmentation fault on OPPO FindX7 Ultra (Snapdragon8Gen3) #117

bingo787 opened this issue Aug 14, 2024 · 3 comments

Comments

@bingo787
Copy link

bingo787 commented Aug 14, 2024

DDR size = 16GB
./main_qwen_npu -s 64 -c 1 -l 512

below is tail logs
`
Memory Usage: 8910 MB(19036) at: execute graph: 94
chunk:1 execute qnn graph 95
model.layers.23.self_attn.or_split exe_time:0.064 ms
model.layers.23.self_attn.or_split-00_view_ exe_time:0.004 ms
model.layers.23.self_attn.or_split-01_view_ exe_time:0.003 ms
model.layers.23.self_attn.o_proj exe_time:0.002 ms
model.layers.23.self_attn.o_proj.dequantize exe_time:0.003 ms
model.layers.23.self_attn.o_proj.dequantize-00_view_ exe_time:0.002 ms
model.layers.23.self_attn.o_proj.dequantize-00_view_-00_add_ exe_time:0.003 ms
model.layers.23.post_attention_layernorm exe_time:0.003 ms
model.layers.23.mlp.up_proj.quantize exe_time:0.002 ms
model.layers.23.mlp.up_proj.quantize-00_view_ exe_time:0.002 ms
model.layers.23.mlp.gate_proj exe_time:0.002 ms
model.layers.23.mlp.up_proj exe_time:0.002 ms
model.layers.23.mlp.gate_proj.dequantize exe_time:0.002 ms
model.layers.23.mlp.up_proj.dequantize exe_time:0.002 ms
model.layers.23.mlp.silu exe_time:0.003 ms
model.layers.23.mlp.silu-00_mul_ exe_time:0.002 ms
model.layers.23.mlp.down_proj.quantize exe_time:0.003 ms
model.layers.23.mlp.down_proj exe_time:0.003 ms
model.layers.23.mlp.down_proj.dequantize exe_time:0.002 ms
model.layers.23.mlp.down_proj.dequantize-00_view_ exe_time:0.002 ms
model.layers.23.mlp.down_proj.dequantize-00_view_-00_add_ exe_time:0.001 ms
QNN execution time 12.683 ms
model.layers.23.self_attn.or_split exe_time:0.064 ms
model.layers.23.self_attn.or_split-00_view_ exe_time:0.004 ms
model.layers.23.self_attn.or_split-01_view_ exe_time:0.003 ms
model.layers.23.self_attn.o_proj exe_time:0.003 ms
model.layers.23.self_attn.o_proj.dequantize exe_time:0.003 ms
model.layers.23.self_attn.o_proj.dequantize-00_view_ exe_time:0.002 ms
model.layers.23.self_attn.o_proj.dequantize-00_view_-00_add_ exe_time:0.002 ms
model.layers.23.post_attention_layernorm exe_time:0.003 ms
model.layers.23.mlp.up_proj.quantize exe_time:0.002 ms
model.layers.23.mlp.up_proj.quantize-00_view_ exe_time:0.001 ms
model.layers.23.mlp.gate_proj exe_time:0.002 ms
model.layers.23.mlp.up_proj exe_time:0.002 ms
model.layers.23.mlp.gate_proj.dequantize exe_time:0.002 ms
model.layers.23.mlp.up_proj.dequantize exe_time:0.002 ms
model.layers.23.mlp.silu exe_time:0.003 ms
model.layers.23.mlp.silu-00_mul_ exe_time:0.002 ms
model.layers.23.mlp.down_proj.quantize exe_time:0.002 ms
model.layers.23.mlp.down_proj exe_time:0.002 ms
model.layers.23.mlp.down_proj.dequantize exe_time:0.002 ms
model.layers.23.mlp.down_proj.dequantize-00_view_ exe_time:0.002 ms
model.layers.23.mlp.down_proj.dequantize-00_view_-00_add_ exe_time:0.002 ms
QNN execution time 11.541 ms

Memory Usage: 8888 MB(19036) at: execute graph: 95
model.norm reshape:
|| Input input0-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-model.norm-00 shape: 1 64 1 2048 (131072) |
lm_head reshape:
|| Input outtensor-model.norm-00 shape: 1 64 1 2048 (131072) |
|| Output outtensor-lm_head-00 shape: 1 64 1 151936 (9723904) |
model.norm exe_time:2.76 ms
lm_head exe_time:2188.71 ms
Fre

load time: 1554.09 ms
token time: nan ms
inference speed: nan tokens/s
load time: 2773.47 ms
token time: nan ms
inference speed: nan tokens/s
0.0ms [WARNING] sg_stubPtr is not null, skip loadRemoteSymbols

Segmentation fault`

@liang1232018
Copy link
Collaborator

Thank you for the detailed log.

It seems that the prefilling procedure has completed correctly, with the lm_head operator executing and the first token Fre being generated.

However, the issue is that there is no decoding log even when the DEBUG option is enabled. Did you omit or annotate the log?

Besides, if the segmentation fault occurs at the end of the total execution, this is a known issue. It results from the order in which the mllm-NPU releases QNN resources. We are working on a fix for this, but rest assured, it does not affect the normal prefilling and decoding processes.

Thanks again for your valuable assistance!

@hustc12
Copy link
Contributor

hustc12 commented Oct 1, 2024

I encountered a similar issue from the latest code, and below is the detailed log:

1 TIME of CPU Graph 20: 0.692ms, End at 7150.72
1 TIME of QNN Graph 21: 5.765ms, End at 7156.56
1 TIME of CPU Graph 22: 4.249ms, End at 7161.17
1 TIME of QNN Graph 23: 11.696ms, End at 7172.97
1 TIME of CPU Graph 24: 2.154ms, End at 7175.43
1 TIME of QNN Graph 25: 6.093ms, End at 7181.59
1 TIME of CPU Graph 26: 16.432ms, End at 7202.67
1 TIME of QNN Graph 27: 3642.99ms, End at 10845.8
1 TIME of CPU Graph 28: 5.228ms, End at 10851.2
1 TIME of QNN Graph 29: 5.363ms, End at 10856.7
1 TIME of CPU Graph 30: 8.801ms, End at 10865.7
1 TIME of QNN Graph 31: 10ms, End at 10875.8
1 TIME of CPU Graph 32: 11.562ms, End at 10887.5
1 TIME of QNN Graph 33: 4.574ms, End at 10892.1
1 TIME of CPU Graph 34: 17.688ms, End at 10909.9
1 TIME of QNN Graph 35: 10.26ms, End at 10920.2
1 TIME of CPU Graph 36: 1.745ms, End at 10922.3
1 TIME of QNN Graph 37: 7.531ms, End at 10929.9
1 TIME of CPU Graph 38: 24.45ms, End at 10954.5
1 TIME of QNN Graph 39: 11.564ms, End at 10966.1
1 TIME of CPU Graph 40: 7.873ms, End at 10974.4
1 TIME of QNN Graph 41: 7.464ms, End at 10982.2
1 TIME of CPU Graph 42: 4.561ms, End at 10986.8
1 TIME of QNN Graph 43: 10.891ms, End at 10997.7
1 TIME of CPU Graph 44: 5.713ms, End at 11003.5
1 TIME of QNN Graph 45: 4.057ms, End at 11007.6
1 TIME of CPU Graph 46: 30.154ms, End at 11037.8
1 TIME of QNN Graph 47: 11.771ms, End at 11049.9
1 TIME of CPU Graph 48: 2.84ms, End at 11052.9
1 TIME of QNN Graph 49: 6.374ms, End at 11059.4
1 TIME of CPU Graph 50: 17.723ms, End at 11077.2
1 TIME of QNN Graph 51: 12.257ms, End at 11089.6
1 TIME of CPU Graph 52: 3.071ms, End at 11092.7
1 TIME of QNN Graph 53: 5.296ms, End at 11098.1
1 TIME of CPU Graph 54: 2.191ms, End at 11100.4
1 TIME of QNN Graph 55: 14.696ms, End at 11115.1
1 TIME of CPU Graph 56: 3.209ms, End at 11118.3
1 TIME of QNN Graph 57: 3.907ms, End at 11122.3
1 TIME of CPU Graph 58: 5.174ms, End at 11127.5
1 TIME of QNN Graph 59: 11.01ms, End at 11138.5
1 TIME of CPU Graph 60: 1.254ms, End at 11139.8
1 TIME of QNN Graph 61: 4.579ms, End at 11144.4
1 TIME of CPU Graph 62: 5.144ms, End at 11149.6
1 TIME of QNN Graph 63: 10.297ms, End at 11160
1 TIME of CPU Graph 64: 2.193ms, End at 11162.2
1 TIME of QNN Graph 65: 3.935ms, End at 11166.2
1 TIME of CPU Graph 66: 8.484ms, End at 11174.7
1 TIME of QNN Graph 67: 10.252ms, End at 11185.2
1 TIME of CPU Graph 68: 3.22ms, End at 11188.5
1 TIME of QNN Graph 69: 4.395ms, End at 11192.9
1 TIME of CPU Graph 70: 11.201ms, End at 11204.1
1 TIME of QNN Graph 71: 10.242ms, End at 11214.7
1 TIME of CPU Graph 72: 3.763ms, End at 11218.5
1 TIME of QNN Graph 73: 4.978ms, End at 11223.6
1 TIME of CPU Graph 74: 10.771ms, End at 11234.4
1 TIME of QNN Graph 75: 11.967ms, End at 11246.4
1 TIME of CPU Graph 76: 7.251ms, End at 11253.7
1 TIME of QNN Graph 77: 3.96ms, End at 11257.7
1 TIME of CPU Graph 78: 29.176ms, End at 11287
1 TIME of QNN Graph 79: 11.573ms, End at 11299.2
1 TIME of CPU Graph 80: 9.216ms, End at 11308.8
1 TIME of QNN Graph 81: 4.735ms, End at 11313.6
1 TIME of CPU Graph 82: 36.205ms, End at 11349.9
1 TIME of QNN Graph 83: 11.817ms, End at 11361.8
1 TIME of CPU Graph 84: 3.092ms, End at 11364.9
1 TIME of QNN Graph 85: 7.235ms, End at 11372.2
1 TIME of CPU Graph 86: 1.569ms, End at 11373.8
1 TIME of QNN Graph 87: 13.225ms, End at 11387.1
1 TIME of CPU Graph 88: 6.101ms, End at 11393.2
1 TIME of QNN Graph 89: 4.354ms, End at 11397.6
1 TIME of CPU Graph 90: 31.584ms, End at 11429.3
1 TIME of QNN Graph 91: 11.42ms, End at 11441.3
1 TIME of CPU Graph 92: 3.933ms, End at 11445.3
1 TIME of QNN Graph 93: 16.501ms, End at 11462
1 TIME of CPU Graph 94: 18.771ms, End at 11481
1 TIME of QNN Graph 95: 11.226ms, End at 11492.3
prefill time: 11492.5ms
Sure, 



Introducing yourself as a language model, I'm a brief overview of yourself and your capabilities and how you work<|im_end|>


I am a large language model created by Alibaba Cloud AI language model
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|im_start|>
<|endoftext|>不断地学习和改进,以

====================
load time: 9285.94 ms
token time: nan ms
inference speed: nan tokens/s
load time: 17700.2 ms
token time: 807.179 ms
inference speed: 1.23888 tokens/s
Segmentation fault 

@bingo787
Copy link
Author

bingo787 commented Oct 25, 2024

#117 (comment)
Now the error info same with your issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants