Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TVM RPC Error "PLS isn't existed" on Khadas VIM3 Pro (Amlogic A311D) #189

Open
leokuo725 opened this issue Oct 12, 2021 · 37 comments
Open
Assignees

Comments

@leokuo725
Copy link

leokuo725 commented Oct 12, 2021

@sunshinemyson

I tried the VSI NPU as TVM target, ran the test_operations.py in TVM_FOLDER/tests/python/contrib/test_vsi_npu.
It had error "PLS isn't existed" on VIM3 Pro side. I found the previous issue , I cannot solve the problem by setting "VSIMULATOR_CONFIG=VIPNANOQI_PID0X88".
The following is my environment:

Environment variable (Host)

export VSIMULATOR_CONFIG=VIPNANOQI_PID0X88 # This PID is provided by khadas document.
export VIV_VX_DEBUG_LEVEL=1 

Environment variable (VIM3 Pro)

export VIV_VX_DEBUG_LEVEL=1 
  • Model: Khadas VIM3 Pro

  • SoC: Amlogic A311D with 5 TOPS Performance NPU

  • OS information:

    Read More

    khadas@Khadas:~$ uname -a
    Linux Khadas 4.9.241 #18 SMP PREEMPT Fri Jun 25 14:18:34 CST 2021 aarch64 aarch64 aarch64 GNU/Linux
    khadas@Khadas:~$ cat /etc/fenix-release
    # PLEASE DO NOT EDIT THIS FILE
    BOARD=VIM3
    VENDOR=Amlogic
    VERSION=1.0.7
    ARCH=arm64
    INITRD_ARCH=arm64
    INSTALL_TYPE=EMMC
    IMAGE_VERSION=V1.0.7-210625
    ################ GIT VERSION ################
    UBOOT_GIT_VERSION=khadas-vims-v1.0.5-release
    LINUX_GIT_VERSION=khadas-vims-v1.0.5-release-6-gc5aa6ab
    FENIX_GIT_VERSION=v1.0.7
    #############################################

  • NPU information:

    Read More

    khadas@Khadas:~$ dpkg -l | grep npu
    ii  aml-npu                              6.4.4.3AAA-2                                                 arm64        Amlogic NPU libraries.
    ii  evtest                               1:1.34-1                                                     arm64        utility to monitor Linux input device events
    ii  libinput-bin                         1.15.5-1ubuntu0.2                                            arm64        input device management and event handling library - udev quirks
    ii  libinput10:arm64                     1.15.5-1ubuntu0.2                                            arm64        input device management and event handling library - shared library
    ii  libxi6:arm64                         2:1.7.10-0ubuntu1                                            arm64        X11 Input extension library
    khadas@Khadas:~$ lsmod
    Module                  Size  Used by
    cpufreq_powersave      16384  0
    cpufreq_userspace      16384  0
    cpufreq_conservative    16384  0
    cpufreq_ondemand       20480  0
    iv009_isp_sensor      270336  0
    iv009_isp_lens         69632  0
    iv009_isp_iq          544768  0
    galcore               462848  0
    vpu                    49152  0
    encoder                53248  0
    amvdec_avs2           192512  0
    amvdec_vp9            151552  0
    amvdec_vc1             53248  0
    amvdec_real            40960  0
    amvdec_mmpeg4          32768  0
    amvdec_mpeg4           53248  0
    amvdec_mmpeg12         40960  0
    amvdec_mpeg12          90112  0
    amvdec_mmjpeg          28672  0
    amvdec_mjpeg           36864  0
    amvdec_h265           135168  0
    amvdec_h264mvc         49152  0
    amvdec_mh264          151552  0
    amvdec_h264           118784  0
    amvdec_avs             61440  0
    stream_input          180224  10 amvdec_h265,amvdec_mh264,amvdec_h264mvc,amvdec_real,amvdec_vp9,amvdec_h264,amvdec_avs2,amvdec_mpeg12,amvdec_avs,amvdec_mmpeg12
    decoder_common        176128  17 amvdec_h265,amvdec_mjpeg,amvdec_mh264,amvdec_mmpeg4,amvdec_h264mvc,amvdec_mmjpeg,amvdec_real,stream_input,amvdec_vp9,amvdec_h264,encoder,amvdec_avs2,amvdec_mpeg12,amvdec_avs,amvdec_vc1,amvdec_mmpeg12,amvdec_mpeg4
    firmware               28672  18 amvdec_h265,amvdec_mjpeg,amvdec_mh264,amvdec_mmpeg4,amvdec_h264mvc,amvdec_mmjpeg,decoder_common,amvdec_real,stream_input,amvdec_vp9,amvdec_h264,encoder,amvdec_avs2,amvdec_mpeg12,amvdec_avs,amvdec_vc1,amvdec_mmpeg12,amvdec_mpeg4
    media_clock            45056  12 amvdec_h265,amvdec_mh264,decoder_common,vpu,firmware,stream_input,amvdec_vp9,amvdec_h264,encoder,amvdec_avs2,amvdec_mpeg12,amvdec_avs
    mali_kbase            475136  0
    iv009_isp             540672  2
    zram                   36864  4
    dhd                  1404928  0
    btrfs                1269760  0
    xor                    20480  1 btrfs
    raid6_pq              106496  1 btrfs
    khadas@Khadas:~$ ls /dev/galcore
    /dev/galcore
    khadas@Khadas:~$ sudo dmesg | grep Gal
    [   12.202405] Galcore version 6.4.4.3.310723AAA
    

TIM-VX Version:1.1.32

TVM Branch commit id: b822ec32702e2676dce1e430221e8efc05c98935

The output message after executing Unittest program of TIM-VX:

Read More

 khadas@Khadas:~/TIM-VX-1.1.32/install/bin$ ./unit_test 
 Running main() from /home/khadas/TIM-VX-1.1.32/_deps/googletest-src/googletest/src/gtest_main.cc
 [==========] Running 104 tests from 33 test suites.
 [----------] Global test environment set-up.
 [----------] 1 test from Context
 <Skip the PASS Items. >
 [----------] 1 test from Context (25 ms total)
 
 [----------] 2 tests from graph
 [ RUN      ] graph.gen_binary_graph_with_empty_graph
 E [_graph_optimization_convert_int8_to_uint8:792]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
 E [vsi_nn_OptimizeGraph:827]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
 [       OK ] graph.gen_binary_graph_with_empty_graph (3 ms)
 [ RUN      ] graph.gen_binary_graph_with_simple_add
 [       OK ] graph.gen_binary_graph_with_simple_add (8 ms)
 [----------] 2 tests from graph (11 ms total)
 
 [----------] 2 tests from Linear
 <Skip the PASS Items. >
 [----------] 2 tests from Linear (13 ms total)
 
 [----------] 3 tests from Conv1d
 <Skip the PASS Items. >
 [----------] 3 tests from Conv1d (22 ms total)
 
 [----------] 19 tests from Conv2d
 <Skip the PASS Items. >
 [----------] 19 tests from Conv2d (195 ms total)
 
 [----------] 2 tests from DeConv1d
 [ RUN      ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1
 /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv1d_test.cc:69: Failure
 Expected equality of these values:
   golden
     Which is: { 27, 81, 30, 9, 3, 21, 15, 27, 0, 0 }
   output_data
     Which is: { 48, 96, 57, 9, 3, 0, 0, 0, 0, 0 }
 Result mismatch
 [  FAILED  ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1 (9 ms)
 <Skip the PASS Items. >
 [----------] 2 tests from DeConv1d (56 ms total)
 
 [----------] 2 tests from DeConv2d
 [ RUN      ] DeConv2d.shape_3_3_2_1_float_depthwise
 /home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv2d_test.cc:85: Failure
 Expected equality of these values:
   golden
     Which is: { 27, 72, 18, 24, 3, 81, 45, 90, 15, 21, 30, 26, 43, 22, 11, 9, 5, 25, 10, 14, 3, 2, 9, 4, 6, 21, 27, 52, 63, 7, 15, 6, ... }
   output_data
     Which is: { 48, 99, 70, 87, 10, 96, 51, 134, 29, 42, 57, 26, 168, 94, 33, 9, 5, 65, 26, 38, 3, 2, 81, 4, 22, 0, 0, 0, 0, 0, 0, 0, ... }
 Result mismatch
 [  FAILED  ] DeConv2d.shape_3_3_2_1_float_depthwise (9 ms)
 <Skip the PASS Items. >
 [----------] 2 tests from DeConv2d (18 ms total)
 
 [----------] 16 tests from DepthwiseConv
 <Skip the PASS Items. >
 [----------] 16 tests from DepthwiseConv (176 ms total)
 
 [----------] 3 tests from FloorDiv
 <Skip the PASS Items. >
 
 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
 (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
 (27:0) : error : undefined identifier: 'COPY'
 (55:0) : error : undefined identifier: 'COPY'
 (257:0) : error : syntax error at 'VXC_512Bits'
 
 ERROR: Failed to compile vx shader. (error: FFFFFFFF)
 E [_gpu_register:476]Build program fail.
 E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.floordiv_U8U8toU8_2D fail with -1.
 
 [       OK ] FloorDiv.shape_5_1_broadcast_uint8 (56 ms)
 [----------] 3 tests from FloorDiv (135 ms total)
 
 [----------] 3 tests from GroupedConv2d
 <Skip the PASS Items. >
 [----------] 3 tests from GroupedConv2d (29 ms total)
 
 [----------] 2 tests from InstanceNorm
 <Skip the PASS Items. >

 [----------] 2 tests from InstanceNorm (208 ms total)
 
 [----------] 2 tests from LayerNorm
 <Skip the PASS Items. >
 [----------] 2 tests from LayerNorm (117 ms total)
 
 [----------] 3 tests from LogSoftmax
 <Skip the PASS Items. >
 [ RUN      ] LogSoftmax.shape_3_6_1_uint8_axis_1
 
 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
 (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
 (27:0) : error : undefined identifier: 'COPY'
 (55:0) : error : undefined identifier: 'COPY'
 (263:0) : error : syntax error at 'VXC_512Bits'
 
 ERROR: Failed to compile vx shader. (error: FFFFFFFF)
 E [_gpu_register:476]Build program fail.
 E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.log_softmax_axis1_U8toU8_2D fail with -1.
 
 [       OK ] LogSoftmax.shape_3_6_1_uint8_axis_1 (70 ms)
 [----------] 3 tests from LogSoftmax (161 ms total)
 
 [----------] 3 tests from Matmul
 <Skip the PASS Items. >
 [ RUN      ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a
 
 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
 (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
 (27:0) : error : undefined identifier: 'COPY'
 (55:0) : error : undefined identifier: 'COPY'
 (261:0) : error : syntax error at 'VXC_512Bits'
 
 ERROR: Failed to compile vx shader. (error: FFFFFFFF)
 E [_gpu_register:476]Build program fail.
 E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.gemm_transa_U8U8toU8 fail with -1.
 
 [       OK ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a (30 ms)
 [----------] 3 tests from Matmul (113 ms total)
 
 [----------] 2 tests from MaxpoolWithArgmax
 <Skip the PASS Items. >
 [ RUN      ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2
 
 (10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
 (255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
 (27:0) : error : undefined identifier: 'COPY'
 (55:0) : error : undefined identifier: 'COPY'
 (258:0) : error : syntax error at 'VXC_512Bits'
 
 ERROR: Failed to compile vx shader. (error: FFFFFFFF)
 E [_gpu_register:476]Build program fail.
 E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.poolwithargmax_U8to_U8_U8_2D fail with -1.
 
 [       OK ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2 (54 ms)
 [----------] 2 tests from MaxpoolWithArgmax (100 ms total)
 
 [----------] 2 tests from MaxUnpool2d
 <Skip the PASS Items. >
 [ RUN      ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2
 
(10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
(256:0) : error : Error(0,256) : Cannot find the header file cl_viv_vx_ext.h.
(27:0) : error : undefined identifier: 'COPY'
(55:0) : error : undefined identifier: 'COPY'
(296:0) : error : undefined identifier: 'vxc_uchar8'
(296:0) : error : undefined identifier: 'vxc_uchar8'
(296:0) : error : undefined identifier: 'vxc_uchar16'
(296:0) : error : undefined identifier: 'vxc_uchar16'
(296:0) : error : undefined identifier: 'vxc_uchar16'
(296:0) : error : undefined identifier: 'vxc_uchar16'
(296:0) : error : undefined identifier: 'vxc_uchar16'
(296:0) : error : undefined identifier: 'vxc_uchar16'
(296:0) : error : undefined identifier: 'din'
(296:0) : error : undefined identifier: 'axisIn'
(296:0) : error : undefined identifier: 'dinExpand'
(296:0) : error : undefined identifier: 'axisInExpand'
(296:0) : error : undefined identifier: 'zpValue'
(296:0) : error : undefined identifier: 'constAxis'
(296:0) : error : undefined identifier: 'axisData'
(296:0) : error : undefined identifier: 'dout'
(296:0) : error : undefined identifier: 'dout'
(296:0) : error : undefined identifier: 'constAxis'
(296:0) : error : undefined identifier: 'axisData'
(296:0) : error : undefined identifier: 'dout'
(296:0) : error : undefined identifier: 'dout'
(308:0) : error : undefined identifier: 'vxc_uchar8'
(308:0) : error : undefined identifier: 'vxc_uchar8'
(308:0) : error : undefined identifier: 'vxc_uchar16'
(308:0) : error : undefined identifier: 'vxc_uchar16'
(308:0) : error : undefined identifier: 'vxc_uchar16'
(308:0) : error : undefined identifier: 'vxc_uchar16'
(308:0) : error : undefined identifier: 'vxc_uchar16'
(308:0) : error : undefined identifier: 'vxc_uchar16'
(308:0) : error : undefined identifier: 'din'
(308:0) : error : undefined identifier: 'axisIn'
(308:0) : error : undefined identifier: 'dinExpand'
(308:0) : error : undefined identifier: 'axisInExpand'
(308:0) : error : undefined identifier: 'zpValue'
(308:0) : error : undefined identifier: 'constAxis'
(308:0) : error : undefined identifier: 'axisData'
(308:0) : error : undefined identifier: 'dout'
(308:0) : error : undefined identifier: 'dout'
(308:0) : error : undefined identifier: 'constAxis'
(308:0) : error : undefined identifier: 'axisData'
(308:0) : error : undefined identifier: 'dout'
(308:0) : error : undefined identifier: 'dout'
(312:0) : error : syntax error at 'VXC_512Bits'

ERROR: Failed to compile vx shader. (error: FFFFFFFF)
E [_gpu_register:476]Build program fail.
E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.upsample_U8_U8to_U8_SAME_2D fail with -1.

[       OK ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2 (60 ms)
[----------] 2 tests from MaxUnpool2d (108 ms total)

[----------] 2 tests from Moments
<Skip the PASS Items. >
[----------] 2 tests from Moments (100 ms total)

[----------] 1 test from Equal
[ RUN      ] Equal.shape_1_uint8

(1:0) : error : Error(0,1) : Cannot find the header file cl_viv_vx_ext.h.
(7:0) : error : syntax error at 'VXC_512Bits'

ERROR: Failed to compile vx shader. (error: FFFFFFFF)
E [_gpu_register:476]Build program fail.
E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.equal_U8U8toBOOL8_2D fail with -1.

[       OK ] Equal.shape_1_uint8 (89 ms)
[----------] 1 test from Equal (89 ms total)

[----------] 1 test from NotEqual
<Skip the PASS Items. >
[----------] 1 test from NotEqual (66 ms total)

[----------] 1 test from Less
<Skip the PASS Items. >
[----------] 1 test from Less (64 ms total)

[----------] 1 test from GreaterOrEqual
<Skip the PASS Items. >
[----------] 1 test from GreaterOrEqual (63 ms total)

[----------] 1 test from Greater
<Skip the PASS Items. >
[----------] 1 test from Greater (63 ms total)

[----------] 1 test from LessOrEqual
<Skip the PASS Items. >
[----------] 1 test from LessOrEqual (63 ms total)

[----------] 2 tests from Reorg
<Skip the PASS Items. >
[----------] 2 tests from Reorg (10 ms total)

[----------] 3 tests from Resize1d
<Skip the PASS Items. >
[ RUN      ] Resize1d.shape_4_2_1_uint8_nearest_whcn

(10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
(255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
(27:0) : error : undefined identifier: 'COPY'
(55:0) : error : undefined identifier: 'COPY'
(257:0) : error : syntax error at 'VXC_512Bits'

ERROR: Failed to compile vx shader. (error: FFFFFFFF)
E [_gpu_register:476]Build program fail.
E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.resize_1d_nearest_U8toU8_op fail with -1.

[       OK ] Resize1d.shape_4_2_1_uint8_nearest_whcn (37 ms)
[ RUN      ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn

[       OK ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn (32 ms)
[----------] 3 tests from Resize1d (98 ms total)

[----------] 2 tests from ScatterND
[ RUN      ] ScatterND.shape_4_4_4

[       OK ] ScatterND.shape_4_4_4 (41 ms)
[ RUN      ] ScatterND.shape_9

(10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
(255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
(27:0) : error : undefined identifier: 'COPY'
(55:0) : error : undefined identifier: 'COPY'
(257:0) : error : syntax error at 'VXC_512Bits'

ERROR: Failed to compile vx shader. (error: FFFFFFFF)
E [_gpu_register:476]Build program fail.
E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.scatter_nd_U8toU8 fail with -1.

[       OK ] ScatterND.shape_9 (25 ms)
[----------] 2 tests from ScatterND (66 ms total)

[----------] 1 test from Floor
[ RUN      ] Floor.shape_5_1_fp32
[       OK ] Floor.shape_5_1_fp32 (5 ms)
[----------] 1 test from Floor (5 ms total)

[----------] 1 test from Cast
[ RUN      ] Cast.shape_5_1_fp32_to_int32

[       OK ] Cast.shape_5_1_fp32_to_int32 (35 ms)
[----------] 1 test from Cast (35 ms total)

[----------] 1 test from SpatialTransformer
[ RUN      ] SpatialTransformer.shape_1_3_3_1_u8
(10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
(23:0) : error : undefined identifier: 'vxc_ushort8'
(26:0) : error : undefined identifier: 'src0'
(27:0) : error : undefined identifier: 'src1'
(29:0) : error : undefined identifier: 'dst'
(31:0) : error : undefined identifier: 'dst'

ERROR: Failed to compile vx shader. (error: FFFFFFFF)
E [vsi_nn_RegisterVXKernel:251][/home/khadas/TIM-VX-1.1.32/src/tim/vx/internal/src/libnnext/vsi_nn_vxkernel.c : 251] vxBuildProgram() Error!

E [vsi_nn_InitKernel:108]Add parameter 0 to kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail. with -12.
E [vsi_nn_InitKernel:121]Finalize kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail with -12.
E [vsi_nn_InitKernel:126]Remove kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail with -10.
E [vsi_nn_RegisterClientKernelAndNewNode:415]Register client kernel com.vivantecorp.extension.vxcTransform_setupThres_F16toF16 fail with -10.
E [compute_node:379]Create node[0] SPATIAL_TRANSFORMER fail
/home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/spatial_transformer_test.cc:74: Failure
Expected equality of these values:
 values_golden
   Which is: { '\x2' (2), '\x3' (3), '\x2' (2), '\x2' (2), '\x3' (3), '\x2' (2), '\x2' (2), '\x3' (3), '\x2' (2) }
 output_values
   Which is: { '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0' }
[  FAILED  ] SpatialTransformer.shape_1_3_3_1_u8 (22 ms)
[----------] 1 test from SpatialTransformer (22 ms total)

[----------] 2 tests from Tile
[ RUN      ] Tile.shape_3_2_float_multiples_2_1

[       OK ] Tile.shape_3_2_float_multiples_2_1 (45 ms)
[ RUN      ] Tile.shape_3_2_1_int8_multiples_2_2_1

(1:0) : error : Error(0,1) : Cannot find the header file cl_viv_vx_ext.h.
(59:0) : error : undefined identifier: 'vxc_uchar8'
(59:0) : error : undefined identifier: 'src'
(59:0) : error : undefined identifier: 'src'
(59:0) : error : undefined identifier: 'src'
(60:0) : error : undefined identifier: 'vxc_uchar8'
(60:0) : error : undefined identifier: 'src'
(60:0) : error : undefined identifier: 'src'
(60:0) : error : undefined identifier: 'src'
(61:0) : error : undefined identifier: 'vxc_uchar8'
(61:0) : error : undefined identifier: 'src'
(61:0) : error : undefined identifier: 'src'
(61:0) : error : undefined identifier: 'src'
(62:0) : error : undefined identifier: 'vxc_uchar8'
(62:0) : error : undefined identifier: 'src'
(62:0) : error : undefined identifier: 'src'
(62:0) : error : undefined identifier: 'src'
(63:0) : error : undefined identifier: 'vxc_uchar8'
(63:0) : error : undefined identifier: 'src'
(63:0) : error : undefined identifier: 'src'
(63:0) : error : undefined identifier: 'src'
(64:0) : error : undefined identifier: 'vxc_uchar8'
(64:0) : error : undefined identifier: 'src'
(64:0) : error : undefined identifier: 'src'
(64:0) : error : undefined identifier: 'src'
(65:0) : error : undefined identifier: 'vxc_uchar8'
(65:0) : error : undefined identifier: 'src'
(65:0) : error : undefined identifier: 'src'
(65:0) : error : undefined identifier: 'src'
(66:0) : error : undefined identifier: 'vxc_uchar8'
(66:0) : error : undefined identifier: 'src'
(66:0) : error : undefined identifier: 'src'
(66:0) : error : undefined identifier: 'src'
(68:0) : error : undefined identifier: 'vxc_short8'
(68:0) : error : undefined identifier: 'src'
(68:0) : error : undefined identifier: 'src'
(68:0) : error : undefined identifier: 'src'
(69:0) : error : undefined identifier: 'vxc_short8'
(69:0) : error : undefined identifier: 'src'
(69:0) : error : undefined identifier: 'src'
(69:0) : error : undefined identifier: 'src'
(70:0) : error : undefined identifier: 'vxc_short8'
(70:0) : error : undefined identifier: 'src'
(70:0) : error : undefined identifier: 'src'
(70:0) : error : undefined identifier: 'src'
(71:0) : error : undefined identifier: 'vxc_short8'
(71:0) : error : undefined identifier: 'src'
(71:0) : error : undefined identifier: 'src'
(71:0) : error : undefined identifier: 'src'
(72:0) : error : undefined identifier: 'vxc_short8'
(72:0) : error : undefined identifier: 'src'
(72:0) : error : undefined identifier: 'src'
(72:0) : error : undefined identifier: 'src'
(73:0) : error : undefined identifier: 'vxc_short8'
(73:0) : error : undefined identifier: 'src'
(73:0) : error : undefined identifier: 'src'
(73:0) : error : undefined identifier: 'src'
(74:0) : error : undefined identifier: 'vxc_short8'
(74:0) : error : undefined identifier: 'src'
(74:0) : error : undefined identifier: 'src'
(74:0) : error : undefined identifier: 'src'
(75:0) : error : undefined identifier: 'vxc_short8'
(75:0) : error : undefined identifier: 'src'
(75:0) : error : undefined identifier: 'src'
(75:0) : error : undefined identifier: 'src'
(115:0) : error : undefined identifier: 'vxc_uchar8'
(115:0) : error : undefined identifier: 'src'
(115:0) : error : undefined identifier: 'src'
(115:0) : error : undefined identifier: 'src'
(116:0) : error : undefined identifier: 'vxc_uchar8'
(116:0) : error : undefined identifier: 'src'
(116:0) : error : undefined identifier: 'src'
(116:0) : error : undefined identifier: 'src'
(117:0) : error : undefined identifier: 'vxc_uchar8'
(117:0) : error : undefined identifier: 'src'
(117:0) : error : undefined identifier: 'src'
(117:0) : error : undefined identifier: 'src'
(118:0) : error : undefined identifier: 'vxc_uchar8'
(118:0) : error : undefined identifier: 'src'
(118:0) : error : undefined identifier: 'src'
(118:0) : error : undefined identifier: 'src'
(119:0) : error : undefined identifier: 'vxc_uchar8'
(119:0) : error : undefined identifier: 'src'
(119:0) : error : undefined identifier: 'src'
(119:0) : error : undefined identifier: 'src'
(120:0) : error : undefined identifier: 'vxc_uchar8'
(120:0) : error : undefined identifier: 'src'
(120:0) : error : undefined identifier: 'src'
(120:0) : error : undefined identifier: 'src'
(121:0) : error : undefined identifier: 'vxc_uchar8'
(121:0) : error : undefined identifier: 'src'
(121:0) : error : undefined identifier: 'src'
(121:0) : error : undefined identifier: 'src'
(122:0) : error : undefined identifier: 'vxc_uchar8'
(122:0) : error : undefined identifier: 'src'
(122:0) : error : undefined identifier: 'src'
(122:0) : error : undefined identifier: 'src'
(124:0) : error : undefined identifier: 'vxc_short8'
(124:0) : error : undefined identifier: 'src'
(124:0) : error : undefined identifier: 'src'

ERROR: Failed to compile vx shader. (error: FFFFFFFF)
E [_gpu_register:476]Build program fail.
E [vsi_nn_kernel_create_node:631]Register client kernel com.vivantecorp.extension.evis.tile_remain3_U8toU8_2D fail with -1.

[       OK ] Tile.shape_3_2_1_int8_multiples_2_2_1 (80 ms)
[----------] 2 tests from Tile (125 ms total)

[----------] 14 tests from TransposeConv2d
<Skip the PASS Items. >
[ RUN      ] TransposeConv2d.shape_4_4_1_1_int8_QuantizedPerChannelOneTest
Segmentation fault
khadas@Khadas:~/TIM-VX-1.1.32/install/bin$ 


The output message after executing TVM test_operations.py at X86 Host side:

Read More

python3 test_operations.py 
Testing QNN pattern                                       1. press any key and continue...
make MOD Done!

conv2d NHWC layout is not optimized for x86 with autotvm.
#[version = "0.0.5"]
def @main(%data: Tensor[(1, 56, 56, 32), int8], %weight: Tensor[(1, 1, 32, 64), int8], %add: Tensor[(64), int32]) {
  %0 = qnn.conv2d(%data, %weight, 0, 77, 0.023528f, 0.045283f, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %1 = nn.bias_add(%0, %add, axis=3);
  qnn.requantize(%1, 0.00106542f, 0, 0.0235285f, 0, out_dtype="int8")
}

get_ref_result
get_vsi_result
get_vsi_model:before relay.build

vsi_npu.py --> qnn.requantize

This is important----> name_node.value() == tvmgen_default_vsi_npu_0
GraphMakerImpl::Create
TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d

VsiNpuModule::GetFunction: get_symbol
VsiNpuModule::GetFunction: return early
VsiNpuModule::GetFunction: get_const_vars
VsiNpuModule::GetFunction: return early
VsiNpuModule::GetFunction: get_const_vars
VsiNpuModule::GetFunction: return early
VsiNpuModule::SaveToBinary
SaveToBinary: nbg size = 15552
SaveToBinary: input size = 1
SaveToBinary: output size = 1
VsiNpuModule : SerializeTensorSpec
VsiNpuModule : SerializeTensorSpec2
VsiNpuModule : SerializeTensorSpec
VsiNpuModule : SerializeTensorSpec2
VsiNpuModule::SaveToBinary2
/tmp/tmpamfs6yew/model.so
model.so
{'data': <tvm.nd.NDArray shape=(1, 56, 56, 32), cpu(0)>
array([[[[1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         ...,
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1]],

        [[1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         ...,
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1]],

        [[1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         ...,
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1]],

        ...,

        [[1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         ...,
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1]],

        [[1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         ...,
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1]],

        [[1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         ...,
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1],
         [1, 1, 1, ..., 1, 1, 1]]]], dtype=int8)}
ref_out [[[[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  ...

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]]]
vsi_out [[[[0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   ...
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]]

  [[0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   ...
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]]

  [[0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   ...
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]]

  ...

  [[0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   ...
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]]

  [[0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   ...
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]]

  [[0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   ...
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]
   [0 0 0 ... 0 0 0]]]]

Expected output: 
[[[[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  ...

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]

  [[-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   ...
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]
   [-128 -128 -128 ...  -67  -65  -64]]]]
Actual output: 

Not equal to tolerance rtol=0.001, atol=0.001

Mismatched elements: 200704 / 200704 (100%)
Max absolute difference: 127
Max relative difference: inf
 x: array([[[[-128, -128, -128, ...,  -67,  -65,  -64],
         [-128, -128, -128, ...,  -67,  -65,  -64],
         [-128, -128, -128, ...,  -67,  -65,  -64],...
 y: array([[[[0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],
         [0, 0, 0, ..., 0, 0, 0],...
FAIL


The output message after executing TVM test_operations.py at VIM3 Pro side:

Read More

python3 -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090
INFO:root:If you are running ROCM/Metal, fork will cause compiler internal error. Try to launch with arg ```--no-fork```
INFO:RPCServer:bind to 0.0.0.0:9090
INFO:RPCServer:connection from ('XXX.XXX.XXX.XXX', 53076)
VsiNpuModule::LoadFromBinary
LoadFromBinary: nbg size = 15552
LoadFromBinary: input size = 1
LoadFromBinary: output size = 1
VsiNpuModule : DeSerializeTensorSpec
VsiNpuModule : DeSerializeTensorSpec2
VsiNpuModule : DeSerializeTensorSpec
VsiNpuModule : DeSerializeTensorSpec2
INFO:RPCServer:load_module /tmp/tmpa5luf_rw/model.so
VsiNpuModule::GetFunction: _lookup_linked_param
VsiNpuModule::GetFunction: return early
VsiNpuModule::GetFunction: _lookup_linked_param
VsiNpuModule::GetFunction: return early
VsiNpuModule::GetFunction: _lookup_linked_param
VsiNpuModule::GetFunction: return early
VsiNpuModule::GetFunction: _lookup_linked_param
VsiNpuModule::GetFunction: return early
VsiNpuModule::GetFunction: tvmgen_default_vsi_npu_0
[     1] PLS isn't existed
Process Graph: 2 ms or 2363 us
VsiNpuModule::GetFunction: size: 2
INFO:RPCServer:Finish serving ('XXX.XXX.XXX.XXX', 53076)


Test Functions Passed in test_operations.py

Read More

test_qnn_add()
test_float_add()
test_float_relu()
test_uint8_relu()
test_float_leaky_relu()
test_uint8_leaky_relu()
test_float_softmax()
test_float_reshape()
test_float_tranpose()
test_float_relu6()
test_uint8_relu6()
test_dequantize()
test_quantize()
test_uint8_avg_pool()
test_uint8_softmax()
test_uint8_reshape()
test_uint8_concatenation()
test_uint8_max_pool()
test_float_mean()?
test_uint8_argmax()
test_float_sigmoid()
test_uint8_sigmoid()
test_uint8_fullconnected()
test_uint8_argmin()
test_uint8_squeeze()
test_uint8_depthtospace()
test_qnn_sub()
test_qnn_multiply()
test_qnn_maximum()
test_qnn_minimum()
test_qnn_logical_and()
test_qnn_logical_or()
test_qnn_pad()
test_uint8_mean()
test_requantize()
test_uint8_transpose_conv2d_pattern()
test_uint8_transpose_conv2d_pattern2()
test_uint8_tanh()


Test Functions Failed in test_operations.py

Read More

test_float32_conv2d_permute() 
#vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100%
test_float32_depthwise_conv2d_permute() 
#vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100%
test_sample_model() 
#vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100%
test_float_avg_pool() 
#vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100%
test_float32_pattern() 
#ref_out!=vsi_out Mismatched elements: 100%
test_uint8_depthwiseconv2d_pattern() 
#ref_out!=vsi_out Mismatched elements: 515 / 864 (59.6%)
test_uint8_conv2d_pattern() 
#vsi_out array elements value are all 0. ref_out!=vsi_out Mismatched elements: 100%
test_uint8_resizeBilinear() 
#AttributeError: module 'tvm.relay.op.image' has no attribute 'resize'
#Because relay.op.image.resize was removed in the version
test_float_batch_norm() 
#std: :bad_alloc
test_uint8_resizeNear() 
#AttributeError: module 'tvm.relay.op.image' has no attribute 'resize'
#Because relay.op.image.resize was removed in the version

If you need more debug messages, please let me know.
Thanks.

@thezha
Copy link
Contributor

thezha commented Oct 12, 2021

It seems that the system is not able to locate runtime compiler header file cl_viv_vx_ext.h. You can either copy this file over to current run directory, or set VIVANTE_SDK_DIR to point to the location which contains this header file (include/CL/cl_viv_vx_ext.h).

(10:0) : error : Error(0,10) : Cannot find the header file cl_viv_vx_ext.h.
(255:0) : error : Error(0,255) : Cannot find the header file cl_viv_vx_ext.h.
(27:0) : error : undefined identifier: 'COPY'
(55:0) : error : undefined identifier: 'COPY'
(257:0) : error : syntax error at 'VXC_512Bits'

@leokuo725
Copy link
Author

leokuo725 commented Oct 12, 2021

@thezha Thanks for your reply.
I found the cl_viv_vx_ext.h location is /usr/include/CL/.
I done this.

export VIVANTE_SDK_DIR=/usr

And ran again TIM_VX unit_test, get some errors without "Cannot find the header file cl_viv_vx_ext.h." error. The following is full output:

Read More

Running main() from /home/khadas/TIM-VX-1.1.32/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 104 tests from 33 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Context
[ RUN      ] Context.create
[       OK ] Context.create (43 ms)
[----------] 1 test from Context (43 ms total)

[----------] 2 tests from graph
[ RUN      ] graph.gen_binary_graph_with_empty_graph
E [_graph_optimization_convert_int8_to_uint8:792]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:827]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
[       OK ] graph.gen_binary_graph_with_empty_graph (6 ms)
[ RUN      ] graph.gen_binary_graph_with_simple_add
[       OK ] graph.gen_binary_graph_with_simple_add (20 ms)
[----------] 2 tests from graph (26 ms total)

[----------] 2 tests from Linear
[ RUN      ] Linear.shape_5_1_fp32
[       OK ] Linear.shape_5_1_fp32 (7 ms)
[ RUN      ] Linear.shape_5_1_fp32_omit_b
[       OK ] Linear.shape_5_1_fp32_omit_b (5 ms)
[----------] 2 tests from Linear (13 ms total)

[----------] 3 tests from Conv1d
[ RUN      ] Conv1d.shape_3_6_1_float_ksize_1_stride_1_weights_3_no_bias_whcn
[       OK ] Conv1d.shape_3_6_1_float_ksize_1_stride_1_weights_3_no_bias_whcn (14 ms)
[ RUN      ] Conv1d.shape_6_2_1_uint8_ksize_6_stride_1_weights_2_whcn
[       OK ] Conv1d.shape_6_2_1_uint8_ksize_6_stride_1_weights_2_whcn (7 ms)
[ RUN      ] Conv1d.shape_6_2_1_uint8_ksize_3_stride_1_pad_1_weights_2_no_bias_whcn
[       OK ] Conv1d.shape_6_2_1_uint8_ksize_3_stride_1_pad_1_weights_2_no_bias_whcn (6 ms)
[----------] 3 tests from Conv1d (27 ms total)

[----------] 19 tests from Conv2d
[ RUN      ] Conv2d.shape_4_2_1_1_float32_PaddingTest
[       OK ] Conv2d.shape_4_2_1_1_float32_PaddingTest (17 ms)
[ RUN      ] Conv2d.shape_4_2_2_2_float32_PointwiseTest
[       OK ] Conv2d.shape_4_2_2_2_float32_PointwiseTest (16 ms)
[ RUN      ] Conv2d.shape_4_2_1_2_float32_SimpleTest
[       OK ] Conv2d.shape_4_2_1_2_float32_SimpleTest (12 ms)
[ RUN      ] Conv2d.shape_4_2_2_2_float32_SimpleChannelsTest
[       OK ] Conv2d.shape_4_2_2_2_float32_SimpleChannelsTest (11 ms)
[ RUN      ] Conv2d.shape_6_3_1_1_float32_SimpleAnisotropicStridesTest
[       OK ] Conv2d.shape_6_3_1_1_float32_SimpleAnisotropicStridesTest (11 ms)
[ RUN      ] Conv2d.shape_4_3_1_1_float32_HandCalculatedTest
[       OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedTest (12 ms)
[ RUN      ] Conv2d.shape_4_3_1_1_float32_HandCalculatedConstFilterTest
[       OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedConstFilterTest (12 ms)
[ RUN      ] Conv2d.shape_4_3_1_1_float32_HandCalculatedBiasTest
[       OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedBiasTest (12 ms)
[ RUN      ] Conv2d.shape_4_3_1_1_float32_HandCalculatedValidTest
[       OK ] Conv2d.shape_4_3_1_1_float32_HandCalculatedValidTest (12 ms)
[ RUN      ] Conv2d.shape_4_2_2_2_float32_DisabledPointwiseMultifilterTest
[       OK ] Conv2d.shape_4_2_2_2_float32_DisabledPointwiseMultifilterTest (9 ms)
[ RUN      ] Conv2d.shape_9_9_1_1_float32_SimpleDilationTest
[       OK ] Conv2d.shape_9_9_1_1_float32_SimpleDilationTest (12 ms)
[ RUN      ] Conv2d.shape_4_2_1_2_float32_StrideTest
[       OK ] Conv2d.shape_4_2_1_2_float32_StrideTest (12 ms)
[ RUN      ] Conv2d.shape_4_2_1_2_float32_InputAndFilterSameWidthHeightTest
[       OK ] Conv2d.shape_4_2_1_2_float32_InputAndFilterSameWidthHeightTest (8 ms)
[ RUN      ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest1
[       OK ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest1 (6 ms)
[ RUN      ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest2
[       OK ] Conv2d.shape_4_2_1_2_uint8_QuantizedTest2 (6 ms)
[ RUN      ] Conv2d.shape_6_3_1_1_uint8_AnisotropicStridesQuantizedTest
[       OK ] Conv2d.shape_6_3_1_1_uint8_AnisotropicStridesQuantizedTest (6 ms)
[ RUN      ] Conv2d.shape_9_9_1_1_uint8_DilationQuantizedTest
[       OK ] Conv2d.shape_9_9_1_1_uint8_DilationQuantizedTest (6 ms)
[ RUN      ] Conv2d.shape_3_2_2_1_int8_QuantizedPerTensorTest
[       OK ] Conv2d.shape_3_2_2_1_int8_QuantizedPerTensorTest (19 ms)
[ RUN      ] Conv2d.shape_3_2_2_1_int8_QuantizedPerChannelTest
[       OK ] Conv2d.shape_3_2_2_1_int8_QuantizedPerChannelTest (12 ms)
[----------] 19 tests from Conv2d (213 ms total)

[----------] 2 tests from DeConv1d
[ RUN      ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1
/home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv1d_test.cc:69: Failure
Expected equality of these values:
  golden
    Which is: { 27, 81, 30, 9, 3, 21, 15, 27, 0, 0 }
  output_data
    Which is: { 48, 96, 57, 9, 3, 0, 0, 0, 0, 0 }
Result mismatch
[  FAILED  ] DeConv1d.no_bias_layout_whcn_depthwise_shape_3_2_1 (9 ms)
[ RUN      ] DeConv1d.layout_whcn_shape_3_1_1
[       OK ] DeConv1d.layout_whcn_shape_3_1_1 (92 ms)
[----------] 2 tests from DeConv1d (101 ms total)

[----------] 2 tests from DeConv2d
[ RUN      ] DeConv2d.shape_3_3_2_1_float_depthwise
/home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/deconv2d_test.cc:85: Failure
Expected equality of these values:
  golden
    Which is: { 27, 72, 18, 24, 3, 81, 45, 90, 15, 21, 30, 26, 43, 22, 11, 9, 5, 25, 10, 14, 3, 2, 9, 4, 6, 21, 27, 52, 63, 7, 15, 6, ... }
  output_data
    Which is: { 48, 99, 70, 87, 10, 96, 51, 134, 29, 42, 57, 26, 168, 94, 33, 9, 5, 65, 26, 38, 3, 2, 81, 4, 22, 0, 0, 0, 0, 0, 0, 0, ... }
Result mismatch
[  FAILED  ] DeConv2d.shape_3_3_2_1_float_depthwise (9 ms)
[ RUN      ] DeConv2d.shape_3_3_1_1_float
[       OK ] DeConv2d.shape_3_3_1_1_float (9 ms)
[----------] 2 tests from DeConv2d (18 ms total)

[----------] 16 tests from DepthwiseConv
[ RUN      ] DepthwiseConv.shape_2_3_2_1_float32_SimpleTest
[       OK ] DepthwiseConv.shape_2_3_2_1_float32_SimpleTest (19 ms)
[ RUN      ] DepthwiseConv.shape_2_3_2_1_float32_StrideValidTest
[       OK ] DepthwiseConv.shape_2_3_2_1_float32_StrideValidTest (12 ms)
[ RUN      ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameTest
[       OK ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameTest (11 ms)
[ RUN      ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameDilationTest
[       OK ] DepthwiseConv.shape_2_3_2_1_float32_StrideSameDilationTest (11 ms)
[ RUN      ] DepthwiseConv.shape_2_3_2_1_float32_PaddingTest
[       OK ] DepthwiseConv.shape_2_3_2_1_float32_PaddingTest (12 ms)
[ RUN      ] DepthwiseConv.shape_9_9_1_1_float32_DilationValidTest
[       OK ] DepthwiseConv.shape_9_9_1_1_float32_DilationValidTest (11 ms)
[ RUN      ] DepthwiseConv.shape_3_3_1_1_float32_DilationSameTest
[       OK ] DepthwiseConv.shape_3_3_1_1_float32_DilationSameTest (12 ms)
[ RUN      ] DepthwiseConv.shape_3_3_4_2_float32_BatchValidTest
[       OK ] DepthwiseConv.shape_3_3_4_2_float32_BatchValidTest (11 ms)
[ RUN      ] DepthwiseConv.shape_2_2_1_4_float32_BatchSameTest
[       OK ] DepthwiseConv.shape_2_2_1_4_float32_BatchSameTest (12 ms)
[ RUN      ] DepthwiseConv.shape_2_3_2_1_uint8_QuantizedTest
[       OK ] DepthwiseConv.shape_2_3_2_1_uint8_QuantizedTest (6 ms)
[ RUN      ] DepthwiseConv.shape_9_9_1_1_uint8_QuantizedDilationdValidTest
[       OK ] DepthwiseConv.shape_9_9_1_1_uint8_QuantizedDilationdValidTest (6 ms)
[ RUN      ] DepthwiseConv.shape_3_3_1_1_uint8_QuantizedDilationdSameTest
[       OK ] DepthwiseConv.shape_3_3_1_1_uint8_QuantizedDilationdSameTest (6 ms)
[ RUN      ] DepthwiseConv.shape_3_2_2_1_int8_PerTensorTest
[       OK ] DepthwiseConv.shape_3_2_2_1_int8_PerTensorTest (13 ms)
[ RUN      ] DepthwiseConv.shape_3_2_2_1_int8_PerAxisTest
[       OK ] DepthwiseConv.shape_3_2_2_1_int8_PerAxisTest (12 ms)
[ RUN      ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelValidTest
[       OK ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelValidTest (12 ms)
[ RUN      ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelSameTest
[       OK ] DepthwiseConv.shape_3_3_8_1_int8_PerChannelSameTest (13 ms)
[----------] 16 tests from DepthwiseConv (181 ms total)

[----------] 3 tests from FloorDiv
[ RUN      ] FloorDiv.shape_1_fp32

[       OK ] FloorDiv.shape_1_fp32 (69 ms)
[ RUN      ] FloorDiv.shape_5_1_broadcast_float32

[       OK ] FloorDiv.shape_5_1_broadcast_float32 (38 ms)
[ RUN      ] FloorDiv.shape_5_1_broadcast_uint8

[       OK ] FloorDiv.shape_5_1_broadcast_uint8 (256 ms)
[----------] 3 tests from FloorDiv (364 ms total)

[----------] 3 tests from GroupedConv2d
[ RUN      ] GroupedConv2d.shape_3_3_6_1_float_group_1_no_bias_whcn
[       OK ] GroupedConv2d.shape_3_3_6_1_float_group_1_no_bias_whcn (7 ms)
[ RUN      ] GroupedConv2d.shape_3_3_6_1_float_group_2_whcn
[       OK ] GroupedConv2d.shape_3_3_6_1_float_group_2_whcn (7 ms)
[ RUN      ] GroupedConv2d.shape_3_3_6_1_uint8_group_6_whcn
[       OK ] GroupedConv2d.shape_3_3_6_1_uint8_group_6_whcn (15 ms)
[----------] 3 tests from GroupedConv2d (29 ms total)

[----------] 2 tests from InstanceNorm
[ RUN      ] InstanceNorm.shape_3_6_1_float


[       OK ] InstanceNorm.shape_3_6_1_float (125 ms)
[ RUN      ] InstanceNorm.shape_3_3_6_1_float


[       OK ] InstanceNorm.shape_3_3_6_1_float (80 ms)
[----------] 2 tests from InstanceNorm (205 ms total)

[----------] 2 tests from LayerNorm
[ RUN      ] LayerNorm.axis_0_shape_3_6_1_float

[       OK ] LayerNorm.axis_0_shape_3_6_1_float (60 ms)
[ RUN      ] LayerNorm.axis_0_shape_2_3_6_1_float

[       OK ] LayerNorm.axis_0_shape_2_3_6_1_float (58 ms)
[----------] 2 tests from LayerNorm (118 ms total)

[----------] 3 tests from LogSoftmax
[ RUN      ] LogSoftmax.shape_6_1_float_axis_0

[       OK ] LogSoftmax.shape_6_1_float_axis_0 (123 ms)
[ RUN      ] LogSoftmax.shape_3_6_1_float_axis_1

[       OK ] LogSoftmax.shape_3_6_1_float_axis_1 (48 ms)
[ RUN      ] LogSoftmax.shape_3_6_1_uint8_axis_1

[       OK ] LogSoftmax.shape_3_6_1_uint8_axis_1 (958 ms)
[----------] 3 tests from LogSoftmax (1129 ms total)

[----------] 3 tests from Matmul
[ RUN      ] Matmul.shape_2_6_shape_6_2_float

[       OK ] Matmul.shape_2_6_shape_6_2_float (38 ms)
[ RUN      ] Matmul.shape_2_3_2_shape_2_3_2_float_transpose_b

[       OK ] Matmul.shape_2_3_2_shape_2_3_2_float_transpose_b (42 ms)
[ RUN      ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a

[       OK ] Matmul.shape_2_3_2_shape_2_3_2_uint8_transpose_a (169 ms)
[----------] 3 tests from Matmul (249 ms total)

[----------] 2 tests from MaxpoolWithArgmax
[ RUN      ] MaxpoolWithArgmax.shape_3_3_1_fp32_kernel_2_stride_2

[       OK ] MaxpoolWithArgmax.shape_3_3_1_fp32_kernel_2_stride_2 (49 ms)
[ RUN      ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2

[       OK ] MaxpoolWithArgmax.shape_4_4_1_uint8_kernel_2_stride_2 (124 ms)
[----------] 2 tests from MaxpoolWithArgmax (173 ms total)

[----------] 2 tests from MaxUnpool2d
[ RUN      ] MaxUnpool2d.shape_2_2_1_fp32_kernel_2_stride_2

[       OK ] MaxUnpool2d.shape_2_2_1_fp32_kernel_2_stride_2 (52 ms)
[ RUN      ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2

[       OK ] MaxUnpool2d.shape_2_2_1_uint8_kernel_2_stride_2 (150 ms)
[----------] 2 tests from MaxUnpool2d (202 ms total)

[----------] 2 tests from Moments
[ RUN      ] Moments.shape_6_3_1_float_axes_0_1

[       OK ] Moments.shape_6_3_1_float_axes_0_1 (62 ms)
[ RUN      ] Moments.shape_3_6_1_float_axes_1_keepdims

[       OK ] Moments.shape_3_6_1_float_axes_1_keepdims (37 ms)
[----------] 2 tests from Moments (99 ms total)

[----------] 1 test from Equal
[ RUN      ] Equal.shape_1_uint8

[       OK ] Equal.shape_1_uint8 (523 ms)
[----------] 1 test from Equal (523 ms total)

[----------] 1 test from NotEqual
[ RUN      ] NotEqual.shape_5_fp32

[       OK ] NotEqual.shape_5_fp32 (64 ms)
[----------] 1 test from NotEqual (64 ms total)

[----------] 1 test from Less
[ RUN      ] Less.shape_5_1_fp32

[       OK ] Less.shape_5_1_fp32 (62 ms)
[----------] 1 test from Less (63 ms total)

[----------] 1 test from GreaterOrEqual
[ RUN      ] GreaterOrEqual.shape_5_2_1_fp32

[       OK ] GreaterOrEqual.shape_5_2_1_fp32 (62 ms)
[----------] 1 test from GreaterOrEqual (63 ms total)

[----------] 1 test from Greater
[ RUN      ] Greater.shape_5_2_1_1_fp32

[       OK ] Greater.shape_5_2_1_1_fp32 (62 ms)
[----------] 1 test from Greater (63 ms total)

[----------] 1 test from LessOrEqual
[ RUN      ] LessOrEqual.shape_1_5_2_1_1_fp32

[       OK ] LessOrEqual.shape_1_5_2_1_1_fp32 (62 ms)
[----------] 1 test from LessOrEqual (62 ms total)

[----------] 2 tests from Reorg
[ RUN      ] Reorg.shape_4_4_4_1_u8
[       OK ] Reorg.shape_4_4_4_1_u8 (6 ms)
[ RUN      ] Reorg.shape_4_4_4_1_fp32
[       OK ] Reorg.shape_4_4_4_1_fp32 (6 ms)
[----------] 2 tests from Reorg (12 ms total)

[----------] 3 tests from Resize1d
[ RUN      ] Resize1d.shape_4_2_1_float_nearest_whcn

[       OK ] Resize1d.shape_4_2_1_float_nearest_whcn (29 ms)
[ RUN      ] Resize1d.shape_4_2_1_uint8_nearest_whcn

[       OK ] Resize1d.shape_4_2_1_uint8_nearest_whcn (100 ms)
[ RUN      ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn

[       OK ] Resize1d.shape_5_1_1_float_bilinear_align_corners_whcn (35 ms)
[----------] 3 tests from Resize1d (164 ms total)

[----------] 2 tests from ScatterND
[ RUN      ] ScatterND.shape_4_4_4

[       OK ] ScatterND.shape_4_4_4 (41 ms)
[ RUN      ] ScatterND.shape_9

[       OK ] ScatterND.shape_9 (74 ms)
[----------] 2 tests from ScatterND (115 ms total)

[----------] 1 test from Floor
[ RUN      ] Floor.shape_5_1_fp32
[       OK ] Floor.shape_5_1_fp32 (5 ms)
[----------] 1 test from Floor (5 ms total)

[----------] 1 test from Cast
[ RUN      ] Cast.shape_5_1_fp32_to_int32

[       OK ] Cast.shape_5_1_fp32_to_int32 (35 ms)
[----------] 1 test from Cast (35 ms total)

[----------] 1 test from SpatialTransformer
[ RUN      ] SpatialTransformer.shape_1_3_3_1_u8
[       OK ] SpatialTransformer.shape_1_3_3_1_u8 (138 ms)
[----------] 1 test from SpatialTransformer (139 ms total)

[----------] 2 tests from Tile
[ RUN      ] Tile.shape_3_2_float_multiples_2_1

[       OK ] Tile.shape_3_2_float_multiples_2_1 (45 ms)
[ RUN      ] Tile.shape_3_2_1_int8_multiples_2_2_1

[       OK ] Tile.shape_3_2_1_int8_multiples_2_2_1 (315 ms)
[----------] 2 tests from Tile (360 ms total)

[----------] 14 tests from TransposeConv2d
[ RUN      ] TransposeConv2d.shape_4_4_1_1_float32_SimpleTest
[       OK ] TransposeConv2d.shape_4_4_1_1_float32_SimpleTest (8 ms)
[ RUN      ] TransposeConv2d.shape_4_4_2_1_float32_SameTest
[       OK ] TransposeConv2d.shape_4_4_2_1_float32_SameTest (9 ms)
[ RUN      ] TransposeConv2d.shape_4_4_2_1_float32_ValidTest
[       OK ] TransposeConv2d.shape_4_4_2_1_float32_ValidTest (8 ms)
[ RUN      ] TransposeConv2d.shape_2_2_1_1_float32_StrideTest
[       OK ] TransposeConv2d.shape_2_2_1_1_float32_StrideTest (9 ms)
[ RUN      ] TransposeConv2d.shape_2_2_1_1_float32_ChannelTest
[       OK ] TransposeConv2d.shape_2_2_1_1_float32_ChannelTest (9 ms)
[ RUN      ] TransposeConv2d.shape_2_1_1_1_float32_AccuracyTest
[       OK ] TransposeConv2d.shape_2_1_1_1_float32_AccuracyTest (9 ms)
[ RUN      ] TransposeConv2d.shape_2_2_1_1_float32_BiasChannelTest
[       OK ] TransposeConv2d.shape_2_2_1_1_float32_BiasChannelTest (12 ms)
[ RUN      ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedTest
[       OK ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedTest (6 ms)
[ RUN      ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedTwoFiltersTest
[       OK ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedTwoFiltersTest (5 ms)
[ RUN      ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedValidTest
[       OK ] TransposeConv2d.shape_4_4_2_1_uint8_QuantizedValidTest (5 ms)
[ RUN      ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedBiasTest
[       OK ] TransposeConv2d.shape_4_4_1_1_uint8_QuantizedBiasTest (5 ms)
[ RUN      ] TransposeConv2d.shape_4_4_1_1_int8_QuantizedPerChannelOneTest
Segmentation fault

@leokuo725
Copy link
Author

@thezha The Galcore version is 6.4.4.3.310723AAA. Is any relation with TIM-VX version?

@thezha
Copy link
Contributor

thezha commented Oct 13, 2021

@leokuo725 I recommend that you get the latest driver SDK/galcore from here and push to the device.

https://github.com/VeriSilicon/TIM-VX/releases/tag/v1.1.34.fix

https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.34.fix/aarch64_A311D_6.4.8.tgz

@sunshinemyson sunshinemyson self-assigned this Oct 13, 2021
@leokuo725
Copy link
Author

leokuo725 commented Oct 13, 2021

@sunshinemyson I build the TIM-VX v1.1.34.fix. Some errors happened.

Read More

[ 94%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/activations_test.cc.o
/usr/bin/ld: ../../src/tim/vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup':
matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode'
collect2: error: ld returned 1 exit status
make[2]: *** [samples/benchmark_test/CMakeFiles/benchmark_test.dir/build.make:100: samples/benchmark_test/benchmark_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:527: samples/benchmark_test/CMakeFiles/benchmark_test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 94%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/addn_test.cc.o
[ 94%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/avg_pool_test.cc.o
[ 95%] Linking CXX executable lenet
[ 95%] Linking CXX executable multi_thread_test
/usr/bin/ld: ../../src/tim/vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup':
matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode'
collect2: error: ld returned 1 exit status
make[2]: *** [samples/lenet/CMakeFiles/lenet.dir/build.make:100: samples/lenet/lenet] Error 1
make[1]: *** [CMakeFiles/Makefile2:555: samples/lenet/CMakeFiles/lenet.dir/all] Error 2
[ 95%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/conv1d_test.cc.o
/usr/bin/ld: ../../src/tim/vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup':
matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode'
collect2: error: ld returned 1 exit status
make[2]: *** [samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/build.make:100: samples/multi_thread_test/multi_thread_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:583: samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/all] Error 2
[ 95%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/conv2d_test.cc.o
[ 95%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/deconv1d_test.cc.o
[ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/deconv2d_test.cc.o
[ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/depthwiseConv_test.cc.o
[ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/elementwise_test.cc.o
[ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/groupedconv2d_test.cc.o
[ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/instancenormalization_test.cc.o
[ 96%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/layernormalization_test.cc.o
[ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/logsoftmax_test.cc.o
[ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/matmul_test.cc.o
[ 97%] Linking CXX shared library libtim-vx.so
[ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/maxpoolwithargmax_test.cc.o
[ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/maxunpool2d_test.cc.o
[ 97%] Built target tim-vx
[ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/moments_test.cc.o
[ 97%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/relational_operations_test.cc.o
[ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/reorg_test.cc.o
[ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/resize1d_test.cc.o
[ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/scatternd_test.cc.o
[ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/shuffle_channel_test.cc.o
[ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/simple_operations_test.cc.o
[ 98%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/spatial_transformer_test.cc.o
[ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/tile_test.cc.o
[ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/transposeConv_test.cc.o
[ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/unidirectional_sequence_lstm_test.cc.o
[ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/vx/ops/unstack_test.cc.o
[ 99%] Building CXX object src/tim/CMakeFiles/unit_test.dir/transform/layout_inference_test.cc.o
[100%] Linking CXX executable unit_test
/usr/bin/ld: vx/internal/libtim_internal.a(matrixmul_vx.c.o): in function `_matrixmulsetup':
matrixmul_vx.c:(.text+0x120): undefined reference to `vxBatchGemmNode'
collect2: error: ld returned 1 exit status
make[2]: *** [src/tim/CMakeFiles/unit_test.dir/build.make:549: src/tim/unit_test] Error 1
make[1]: *** [CMakeFiles/Makefile2:418: src/tim/CMakeFiles/unit_test.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
khadas@Khadas:~/TIM-VX-1.1.34.fix/build$ 

@leokuo725
Copy link
Author

leokuo725 commented Oct 13, 2021

I try cross-compile the TIM-VX now. It can be compiled.
But have Segmentation fault (core dumped) error while I run the test_operations.py.

@thezha
Copy link
Contributor

thezha commented Oct 13, 2021

Is TIM-VX unit test running OK now?

@leokuo725
Copy link
Author

leokuo725 commented Oct 13, 2021

Is TIM-VX unit test running OK now?

@thezha
Cross-Compile the TIM-VX 1.1.34.fix at X86 and no bin folder in install folder.
But I compile for x86 without -DCONFIG=A311D, I can get bin folder in install folder.
And I run unit test on x86 host. It has some error.

Read More

Running main() from /media/data/home/leokuo/TIM-VX-1.1.34.fix/build_x86/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 122 tests from 39 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Context
[ RUN      ] Context.create
[       OK ] Context.create (121 ms)
[----------] 1 test from Context (121 ms total)

[----------] 2 tests from graph
[ RUN      ] graph.gen_binary_graph_with_empty_graph
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
[       OK ] graph.gen_binary_graph_with_empty_graph (140 ms)
[ RUN      ] graph.gen_binary_graph_with_simple_add
[       OK ] graph.gen_binary_graph_with_simple_add (294 ms)
[----------] 2 tests from graph (434 ms total)

[----------] 2 tests from Linear
[ RUN      ] Linear.shape_5_1_fp32
[       OK ] Linear.shape_5_1_fp32 (180 ms)
[ RUN      ] Linear.shape_5_1_fp32_omit_b
[       OK ] Linear.shape_5_1_fp32_omit_b (179 ms)
[----------] 2 tests from Linear (359 ms total)

[----------] 2 tests from Gelu
[ RUN      ] Gelu.shape_5_1_fp32_approximate
W [_setup:243]Call vxTensorTableLookupLayer fail.

[       OK ] Gelu.shape_5_1_fp32_approximate (160 ms)
[ RUN      ] Gelu.shape_5_1_uint8_Quantized
[       OK ] Gelu.shape_5_1_uint8_Quantized (128 ms)
[----------] 2 tests from Gelu (288 ms total)

[----------] 3 tests from AddN
[ RUN      ] AddN.shape_2_2_int32
[       OK ] AddN.shape_2_2_int32 (230 ms)
[ RUN      ] AddN.shape_3_1_float32
[       OK ] AddN.shape_3_1_float32 (230 ms)
[ RUN      ] AddN.shape_2_2_uint8_Quantized
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure
The difference between expected[i] and actual[i] is 4, which exceeds abs_error, where
expected[i] evaluates to 131,
actual[i] evaluates to 127, and
abs_error evaluates to 1.
at index:0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure
The difference between expected[i] and actual[i] is 11, which exceeds abs_error, where
expected[i] evaluates to 138,
actual[i] evaluates to 127, and
abs_error evaluates to 1.
at index:1
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure
The difference between expected[i] and actual[i] is 6, which exceeds abs_error, where
expected[i] evaluates to 133,
actual[i] evaluates to 127, and
abs_error evaluates to 1.
at index:2
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/test_utils.h:118: Failure
The difference between expected[i] and actual[i] is 17, which exceeds abs_error, where
expected[i] evaluates to 144,
actual[i] evaluates to 127, and
abs_error evaluates to 1.
at index:3
[  FAILED  ] AddN.shape_2_2_uint8_Quantized (998 ms)
[----------] 3 tests from AddN (1458 ms total)

[----------] 4 tests from AVG
[ RUN      ] AVG.shape_3_3_1_2_fp32_kernel_2_stride_1
[       OK ] AVG.shape_3_3_1_2_fp32_kernel_2_stride_1 (1055 ms)
[ RUN      ] AVG.shape_3_3_1_1_fp32_kernel_2_stride_1
[       OK ] AVG.shape_3_3_1_1_fp32_kernel_2_stride_1 (1068 ms)
[ RUN      ] AVG.shape_3_3_1_1_uint8_kernel_2_stride_1
[       OK ] AVG.shape_3_3_1_1_uint8_kernel_2_stride_1 (127 ms)
[ RUN      ] AVG.shape_60_52_3_5_fp32_kernel_35_stride_5
[       OK ] AVG.shape_60_52_3_5_fp32_kernel_35_stride_5 (5096 ms)
[----------] 4 tests from AVG (7346 ms total)

[----------] 2 tests from AVG_ANDROID
[ RUN      ] AVG_ANDROID.shape_60_52_3_5_fp32_kernel_35_stride_5
[       OK ] AVG_ANDROID.shape_60_52_3_5_fp32_kernel_35_stride_5 (5113 ms)
[ RUN      ] AVG_ANDROID.shape_60_52_3_5_uint8_kernel_35_stride_5
Segmentation fault (core dumped)

If I execute the old version(1.1.32) unit test with new version SDK(6.4.8) and Galcore version 6.4.6.2.

Read More

```console khadas@Khadas:~/TIM-VX-1.1.32/install/bin$ ./unit_test Running main() from /home/khadas/TIM-VX-1.1.32/_deps/googletest-src/googletest/src/gtest_main.cc [==========] Running 104 tests from 33 test suites. [----------] Global test environment set-up. [----------] 1 test from Context [ RUN ] Context.create [ OK ] Context.create (25 ms) [----------] 1 test from Context (25 ms total)

[----------] 2 tests from graph
[ RUN ] graph.gen_binary_graph_with_empty_graph
E [_graph_optimization_convert_int8_to_uint8:792]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:827]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
[ OK ] graph.gen_binary_graph_with_empty_graph (4 ms)
[ RUN ] graph.gen_binary_graph_with_simple_add
/home/khadas/TIM-VX-1.1.32/src/tim/vx/graph_test.cc:61: Failure
Value of: graph->CompileToBinary(nbg_buf.data(), &bin_size)
Actual: false
Expected: true
/home/khadas/TIM-VX-1.1.32/src/tim/vx/graph_test.cc:72: Failure
Expected equality of these values:
output
Which is: 0
expected_out
Which is: 2
E [compute_node:379]Create node[0] NBG fail
/home/khadas/TIM-VX-1.1.32/src/tim/vx/graph_test.cc:86: Failure
Value of: nbg_graph->Compile()
Actual: false
Expected: true
/home/khadas/TIM-VX-1.1.32/src/tim/vx/graph_test.cc:87: Failure
Value of: nbg_graph->Run()
Actual: false
Expected: true
/home/khadas/TIM-VX-1.1.32/src/tim/vx/graph_test.cc:91: Failure
Expected equality of these values:
output
Which is: 0
expected_out
Which is: 2
[ FAILED ] graph.gen_binary_graph_with_simple_add (7 ms)
[----------] 2 tests from graph (11 ms total)

[----------] 2 tests from Linear
[ RUN ] Linear.shape_5_1_fp32
/home/khadas/TIM-VX-1.1.32/src/tim/vx/ops/activations_test.cc:51: Failure
Value of: graph->Compile()
Actual: false
Expected: true
Segmentation fault

</p>
</details>

@thezha
Copy link
Contributor

thezha commented Oct 13, 2021

please run ldd on the unit_test for both x86 and khadas and supply the output here.

kainan@ubuntu:~/projects/opensource/TIM-VX/build_$ ldd src/tim/unit_test 
	linux-vdso.so.1 (0x00007fffca76e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa847db3000)
	libOpenVX.so.1 => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libOpenVX.so.1 (0x00007fa8474b3000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa8472d1000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa847182000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa847167000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa846f73000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa84856a000)
	libVSC.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libVSC.so (0x00007fa845d5f000)
	libGAL.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libGAL.so (0x00007fa845928000)
	libArchModelSw.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libArchModelSw.so (0x00007fa8456c7000)
	libNNArchPerf.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libNNArchPerf.so (0x00007fa84545b000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa845455000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa845448000)
	libEmulator.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libEmulator.so (0x00007fa844fe9000)
	libvdtproxy.so => /home/kainan/projects/opensource/TIM-VX/prebuilt-sdk/x86_64_linux/lib/libvdtproxy.so (0x00007fa844de6000

@leokuo725
Copy link
Author

leokuo725 commented Oct 13, 2021

@thezha
X86(TIM-VX 1.1.34.fix):

~/TIM-VX-1.1.34.fix/build_x86/src/tim$ ldd unit_test 
	linux-vdso.so.1 (0x00007ffcf6fbb000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fe03d4e0000)
	libOpenVX.so.1 => /usr/lib/x86_64-linux-gnu/libOpenVX.so.1 (0x00007fe03cbe0000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe03c7d3000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe03c435000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe03c21d000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe03be2c000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe03e04b000)
	libVSC.so => /usr/lib/x86_64-linux-gnu/libVSC.so (0x00007fe03ac18000)
	libGAL.so => /usr/lib/x86_64-linux-gnu/libGAL.so (0x00007fe03a7e1000)
	libArchModelSw.so => /usr/lib/x86_64-linux-gnu/libArchModelSw.so (0x00007fe03a580000)
	libNNArchPerf.so => /usr/lib/x86_64-linux-gnu/libNNArchPerf.so (0x00007fe03a314000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe03a110000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe039f08000)
	libEmulator.so => /usr/lib/x86_64-linux-gnu/libEmulator.so (0x00007fe039aa9000)
	libvdtproxy.so => /usr/lib/x86_64-linux-gnu/libvdtproxy.so (0x00007fe0398a6000)

Khadas(TIM-VX 1.1.32):

~/TIM-VX-1.1.32/src/tim$ ldd unit_test 
	linux-vdso.so.1 (0x0000007f7a974000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000007f7a204000)
	libOpenVX.so => /lib/libOpenVX.so (0x0000007f79fe3000)
	libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000007f79dfe000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000007f79d51000)
	libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000007f79d2d000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000007f79bba000)
	/lib/ld-linux-aarch64.so.1 (0x0000007f7a944000)
	libVSC.so => /lib/libVSC.so (0x0000007f78c01000)
	libGAL.so => /lib/libGAL.so (0x0000007f789ff000)
	libArchModelSw.so => /lib/libArchModelSw.so (0x0000007f789b0000)
	libNNArchPerf.so => /lib/libNNArchPerf.so (0x0000007f78943000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000007f7892f000)
	librt.so.1 => /lib/aarch64-linux-gnu/librt.so.1 (0x0000007f78917000)

I cannot find unit_test in TIM-VX v1.1.34.fix(Khadas).

@thezha
Copy link
Contributor

thezha commented Oct 13, 2021

unit_test is not enabled by default, it must be built with 'cmake -DTIM_VX_ENABLE_TEST=ON ..'

From your LDD result, it seems that you copied the SDK libraries to system library folders, this is not advised because they are not part of the system library.

You should remove them from system library path /usr/lib/x86_64-linux-gnu and use LD_LIBRARY_PATH instead. something like this.

export LD_LIBRARY_PATH=`pwd`/../../../prebuilt-sdk/x86_64_linux/lib:$LD_LIBRARY_PATH

@leokuo725
Copy link
Author

leokuo725 commented Oct 13, 2021

unit_test is not enabled by default, it must be built with 'cmake -DTIM_VX_ENABLE_TEST=ON ..'

If I want to cross compile for A311D, should I build with "cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON .." ?

From your LDD result, it seems that you copied the SDK libraries to system library folders, this is not advised because they are not part of the system library.

You should remove them from system library path /usr/lib/x86_64-linux-gnu and use LD_LIBRARY_PATH instead. something like this.

export LD_LIBRARY_PATH=`pwd`/../../../prebuilt-sdk/x86_64_linux/lib:$LD_LIBRARY_PATH

At Khadas side, May I copy from TIM-VX/build/install/lib/* to /usr/lib?

You should remove them from system library path /usr/lib/x86_64-linux-gnu and use LD_LIBRARY_PATH instead. something like this.

@leokuo725
Copy link
Author

If I want to cross compile for A311D, should I build with "cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON .." ?

I tried it. If I set both -DCONFIG=A311D and -DTIM_VX_ENABLE_TEST=ON, there is no unit_test in the src/tim/
Files in src/tim

~/TIM-VX-1.1.34.fix/build2/src/tim$ ls
CMakeFiles           libtim-vx.so        Makefile  vx
cmake_install.cmake  libtim-vx-static.a  utils

@thezha
Copy link
Contributor

thezha commented Oct 13, 2021

At Khadas side, May I copy from TIM-VX/build/install/lib/* to /usr/lib?

It is recommended to copy the entire aarch64_A311D_6.4.8/ folder onto board somewhere and set LD_LIBRARY_PATH to point to it. Something like this,

export LD_LIBREARY_PATH=path_to_aarch64_A311D_6.4.8:$LD_LIBRARY_PATH

Also, inside aarch64_A311D_6.4.8/ folder there is a corresponding galcore.ko, and you should use that.

@thezha
Copy link
Contributor

thezha commented Oct 13, 2021

If I want to cross compile for A311D, should I build with "cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON .." ?

I tried it. If I set both -DCONFIG=A311D and -DTIM_VX_ENABLE_TEST=ON, there is no unit_test in the src/tim/ Files in src/tim

~/TIM-VX-1.1.34.fix/build2/src/tim$ ls
CMakeFiles           libtim-vx.so        Makefile  vx
cmake_install.cmake  libtim-vx-static.a  utils

@sunshinemyson Any idea?

@sunshinemyson
Copy link
Contributor

This is an issue from CMake. Because we hard-reset compiler configuration in A311D.cmake, the cmake will reconfig the project and the TIM_VX_ENABLE_TEST will be reseted.

To fix this issue, you need comment out following config in the A311D.cmake, and create a toolchain config locally.
set(TOOLCHAIN_DIR ${PROJECT_BINARY_DIR}/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu)
set(CMAKE_C_COMPILER ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc)
set(CMAKE_CXX_COMPILER ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-g++)
set(CMAKE_AR ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc-ar)
set(CMAKE_AS ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc-as)
set(CMAKE_LD ${TOOLCHAIN_DIR}/bin/aarch64-linux-gnu-gcc-ld)

Here is my config for your reference
toolchain-vim3.cmake.txt

@leokuo725
Copy link
Author

@sunshinemyson

I followed above steps. Commented out in cmake/A311D.cmake and put toolchain-vim3.cmake.txt into cmake folder.
But have other problem. The file format of libCLC.so is aarch64. How should I do to enable the toolchain-vim3.cmake?

[ 99%] Linking CXX shared library libtim-vx.so
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
src/tim/CMakeFiles/tim-vx.dir/build.make:1080: recipe for target 'src/tim/libtim-vx.so' failed
make[2]: *** [src/tim/libtim-vx.so] Error 1
CMakeFiles/Makefile2:221: recipe for target 'src/tim/CMakeFiles/tim-vx.dir/all' failed
make[1]: *** [src/tim/CMakeFiles/tim-vx.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 99%] Linking CXX executable benchmark_test
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
samples/benchmark_test/CMakeFiles/benchmark_test.dir/build.make:112: recipe for target 'samples/benchmark_test/benchmark_test' failed
make[2]: *** [samples/benchmark_test/benchmark_test] Error 1
CMakeFiles/Makefile2:300: recipe for target 'samples/benchmark_test/CMakeFiles/benchmark_test.dir/all' failed
make[1]: *** [samples/benchmark_test/CMakeFiles/benchmark_test.dir/all] Error 2
[ 99%] Linking CXX executable lenet
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
samples/lenet/CMakeFiles/lenet.dir/build.make:112: recipe for target 'samples/lenet/lenet' failed
make[2]: *** [samples/lenet/lenet] Error 1
CMakeFiles/Makefile2:327: recipe for target 'samples/lenet/CMakeFiles/lenet.dir/all' failed
make[1]: *** [samples/lenet/CMakeFiles/lenet.dir/all] Error 2
[100%] Linking CXX executable multi_thread_test
../../aarch64_A311D_6.4.8/lib/libCLC.so: error adding symbols: File in wrong format
collect2: error: ld returned 1 exit status
samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/build.make:112: recipe for target 'samples/multi_thread_test/multi_thread_test' failed
make[2]: *** [samples/multi_thread_test/multi_thread_test] Error 1
CMakeFiles/Makefile2:354: recipe for target 'samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/all' failed
make[1]: *** [samples/multi_thread_test/CMakeFiles/multi_thread_test.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2

@sunshinemyson
Copy link
Contributor

it looks like you link target so with host build. did you set toolchain by -DCMAKE_TOOLCHAIN_FILE ?

@leokuo725
Copy link
Author

leokuo725 commented Oct 14, 2021

@sunshinemyson

I used, and got error.

cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON -DCMAKE_TOOLCHAIN_FILE=TIM-VX-1.1.34.fix/cmake/toolchain-vim3.cmake  ..
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
CMake Error at CMakeLists.txt:2 (project):
  The CMAKE_C_COMPILER:

    /opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
  the compiler, or to the compiler name if it is in the PATH.


CMake Error at CMakeLists.txt:2 (project):
  The CMAKE_CXX_COMPILER:

    /opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++

  is not a full path to an existing compiler tool.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


-- Configuring incomplete, errors occurred!

I want to set CROSS_COMPILE_ENV to ${PROJECT_BINARY_DIR}/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu.

#Original
set(CROSS_COMPILE_ENV "/opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu")
#Modified
set(CROSS_COMPILE_ENV "${PROJECT_BINARY_DIR}/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu")

Then, got another error.

cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON -DCMAKE_TOOLCHAIN_FILE=/media/data/home/leokuo/TIM-VX-1.1.34.fix/cmake/toolchain-vim3.cmake  ..
-- The C compiler identification is GNU 7.3.1
-- The CXX compiler identification is GNU 7.3.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc
-- Check for working C compiler: /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc - broken
CMake Error at /media/data/shared/cmake-3.20.0-rc2-linux-x86_64/share/cmake-3.20/Modules/CMakeTestCCompiler.cmake:66 (message):
  The C compiler

    "/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp
    
    Run Build Command(s):/usr/bin/make -f Makefile cmTC_7fde6/fast && /usr/bin/make  -f CMakeFiles/cmTC_7fde6.dir/build.make CMakeFiles/cmTC_7fde6.dir/build
    make[1]: Entering directory '/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp'
    Building C object CMakeFiles/cmTC_7fde6.dir/testCCompiler.c.o
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc --sysroot=/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc   -mtune=cortex-a53 -o CMakeFiles/cmTC_7fde6.dir/testCCompiler.c.o -c /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp/testCCompiler.c
    Linking C executable cmTC_7fde6
    /media/data/shared/cmake-3.20.0-rc2-linux-x86_64/bin/cmake -E cmake_link_script CMakeFiles/cmTC_7fde6.dir/link.txt --verbose=1
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc --sysroot=/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/aarch64-linux-gnu/libc CMakeFiles/cmTC_7fde6.dir/testCCompiler.c.o -o cmTC_7fde6 
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find crt1.o: No such file or directory
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find crti.o: No such file or directory
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find -lc
    /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.3.1/../../../../aarch64-linux-gnu/bin/ld: cannot find crtn.o: No such file or directory
    collect2: error: ld returned 1 exit status
    CMakeFiles/cmTC_7fde6.dir/build.make:98: recipe for target 'cmTC_7fde6' failed
    make[1]: *** [cmTC_7fde6] Error 1
    make[1]: Leaving directory '/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeTmp'
    Makefile:127: recipe for target 'cmTC_7fde6/fast' failed
    make: *** [cmTC_7fde6/fast] Error 2
    
    

  

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:2 (project)


-- Configuring incomplete, errors occurred!
See also "/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeOutput.log".
See also "/media/data/home/leokuo/TIM-VX-1.1.34.fix/build/CMakeFiles/CMakeError.log".

Should I need to change any environment about PATH or CMAKE?

@sunshinemyson
Copy link
Contributor

You should download toolchain from https://cnbj1.fds.api.xiaomi.com/mace/third-party/gcc-linaro/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu.tar.xz, and change the toolchain configuration with your local install directory.

This is my local host directory, you need change it.
/opt/test_hub/vosp/toolchain/vim3_A311D/gcc-linaro-7.3.1-2018.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-g++

@leokuo725
Copy link
Author

@sunshinemyson
The following are my steps. But still no unit test program in src/tim.
NO ERROR in these commands.

cmake -DCONFIG=A311D -DTIM_VX_ENABLE_TEST=ON -DCMAKE_TOOLCHAIN_FILE=/media/data/home/leokuo/TIM-VX-1.1.34.fix/cmake/toolchain-vim3.cmake  ..
make -j32
make install
ls -al src/tim/
total 31868
drwxr-xr-x 5 leokuo leokuo     4096 Oct 14 13:48 .
drwxr-xr-x 3 leokuo leokuo     4096 Oct 14 11:51 ..
drwxr-xr-x 4 leokuo leokuo     4096 Oct 14 11:51 CMakeFiles
-rw-r--r-- 1 leokuo leokuo     5960 Oct 14 11:51 cmake_install.cmake
-rwxr-xr-x 1 leokuo leokuo 11956368 Oct 14 13:48 libtim-vx.so
-rw-r--r-- 1 leokuo leokuo 20526940 Oct 14 13:48 libtim-vx-static.a
-rw-r--r-- 1 leokuo leokuo   112260 Oct 14 11:51 Makefile
drwxr-xr-x 3 leokuo leokuo     4096 Oct 14 11:51 utils
drwxr-xr-x 3 leokuo leokuo     4096 Oct 14 11:51 vx

@leokuo725
Copy link
Author

@sunshinemyson
I added "set(TIM_VX_ENABLE_TEST ON)" to CMakeList.txt:(Between if("${CONFIG}" STREQUAL "A311D") and include(cmake/A311D.cmake))
So, I got unit_test at VIM3. The following is the output message with errors.

khadas@Khadas:~/TIM-VX-1.1.34.fix/build/src/tim$ ./unit_test 
Running main() from /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 122 tests from 39 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Context
[ RUN      ] Context.create
[       OK ] Context.create (23 ms)
[----------] 1 test from Context (23 ms total)

[----------] 2 tests from graph
[ RUN      ] graph.gen_binary_graph_with_empty_graph
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
[       OK ] graph.gen_binary_graph_with_empty_graph (3 ms)
[ RUN      ] graph.gen_binary_graph_with_simple_add
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:61: Failure
Value of: graph->CompileToBinary(nbg_buf.data(), &bin_size)
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:72: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
E [compute_node:379]Create node[0] NBG fail
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:86: Failure
Value of: nbg_graph->Compile()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:87: Failure
Value of: nbg_graph->Run()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:91: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
[  FAILED  ] graph.gen_binary_graph_with_simple_add (8 ms)
[----------] 2 tests from graph (11 ms total)

[----------] 2 tests from Linear
[ RUN      ] Linear.shape_5_1_fp32
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:55: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:56: Failure
Value of: graph->Run()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:59: Failure
Expected equality of these values:
  golden
    Which is: { -0.5, 1.9, 2, 2.55, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
[  FAILED  ] Linear.shape_5_1_fp32 (7 ms)
[ RUN      ] Linear.shape_5_1_fp32_omit_b
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:86: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:87: Failure
Value of: graph->Run()
  Actual: false
Expected: true
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:90: Failure
Expected equality of these values:
  golden
    Which is: { -5, -0.2, 0, 1.1, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
[  FAILED  ] Linear.shape_5_1_fp32_omit_b (7 ms)
[----------] 2 tests from Linear (14 ms total)

[----------] 2 tests from Gelu
[ RUN      ] Gelu.shape_5_1_fp32_approximate
W [_setup:243]Call vxTensorTableLookupLayer fail.

Segmentation fault

@sunshinemyson
Copy link
Contributor

@leo,

Please try to set VIV_VX_DEBUG_LEVEL=1 and share the log again. It's interesting because I can get a full pass on my side. And your graph cannot be compiled successfully all the time.

@leokuo725
Copy link
Author

@sunshinemyson
Did you get full pass on VIM3 Pro?Or on X86 simulator?
The following is the output with VIV_VX_DEBUG_LEVEL=1:

khadas@Khadas:~/TIM-VX-1.1.34.fix/build/src/tim$ ./unit_test 
Running main() from /media/data/home/leokuo/TIM-VX-1.1.34.fix/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 122 tests from 39 test suites.
[----------] Global test environment set-up.
[----------] 1 test from Context
[ RUN      ] Context.create
#productname=VIPNano-QI, pid=0x88
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0x79fa81b0
Created VX Thread: 0x7ad621b0
Exit VX Thread: 0x79fa81b0
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0x79fa81b0
Exit VX Thread: 0x79fa81b0
Exit VX Thread: 0x7ad621b0
[       OK ] Context.create (30 ms)
[----------] 1 test from Context (30 ms total)

[----------] 2 tests from graph
[ RUN      ] graph.gen_binary_graph_with_empty_graph
#productname=VIPNano-QI, pid=0x88
Created VX Thread: 0x7ad621b0
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
Exit VX Thread: 0x7ad621b0
[       OK ] graph.gen_binary_graph_with_empty_graph (5 ms)
[ RUN      ] graph.gen_binary_graph_with_simple_add
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cb77740
prev_ptrs = 0x3cbb07c0
prev_ptrs = 0x3cbb0fc0
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   1    1    1 1,        4, 0x0x3cb77b60(0x0x3cb77b60, 0x(nil)) ->    1    1    1 1,        4, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x380, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x0, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x380, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x48, patchSize: 0x364, lcdSize 0x480
NBG: entranceSize: 0x1f0, nbIOSize: 0x15c, layeSize: 0x4c, sectionsSize: 0x450, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x480, headerSize : 0x7e8
Calculate NBG size : 4776 bytes
generate NBG into memory start.
vxoBinaryGraph_SaveBinaryEntrance[14907]: collect input count=0, output count=0
vxoBinaryGraph_SaveBinaryEntrance[14982]: total operation count=1
generate NBG, device count=1, core count per-device: 1, 
 input table address: 0x44fd9740 0x44fd67c0 
 output table address: 0x44fd3fc0 
vxoBinaryGraph_SaveBinaryEntranceExt[14131]: graph input/output=2/1, refine input count=2, output count=1
NBG network name field : dummy_network_name
vxoBinaryGraph_SaveBinaryEntranceExt[14697]: header input count=2, output count=1
generate NGB, save initialize commands
generate NBG, map VIP-SRAM start address=0x400000
generate NBG, patch AXI-SRAM startAddress=0xff000000, endAddress=0xff100000
vxoBinaryGraph_SaveInitialOperation[10003]:fail to search AXI-SRAM address in init command buffer
Dump HEX data size 0x20
0801028A 00000011 08010E13 00000002 08010E21 00220000 3CF03630 00000000
vxoBinaryGraph_SaveBinaryEntrance[15553]: failed to save initial operation
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:61: Failure
Value of: graph->CompileToBinary(nbg_buf.data(), &bin_size)
  Actual: false
Expected: true
prev_ptrs = 0x3cb77740
prev_ptrs = 0x3cbb07c0
prev_ptrs = 0x3cbb0fc0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:72: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
prev_ptrs = 0x3cebda00
prev_ptrs = 0x3cebe2c0
prev_ptrs = 0x3cebea80
prev_ptrs = 0x3cebda00
prev_ptrs = 0x3cebe2c0
binary graph format version, 0x1000c
readBinDynamic[1861]: lcd size if 0, error
fail in read Binary Dynamic
fail to load binary from pointer to create graph
NBG error, please provide genereating NBG logs first
fail to import kernel from VPMN
                               , error code: -1
E [compute_node:379]Create node[0] NBG fail
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:86: Failure
Value of: nbg_graph->Compile()
  Actual: false
Expected: true
vxProcessGraph[15913]: Process Graph fail!
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:87: Failure
Value of: nbg_graph->Run()
  Actual: false
Expected: true
prev_ptrs = 0x3cebea80
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/graph_test.cc:91: Failure
Expected equality of these values:
  output
    Which is: 0
  expected_out
    Which is: 2
prev_ptrs = 0x3cebda00
prev_ptrs = 0x3cebe2c0
prev_ptrs = 0x3cebea80
prev_ptrs = 0x3cb77740
prev_ptrs = 0x3cbb07c0
prev_ptrs = 0x3cbb0fc0
Exit VX Thread: 0x7ad621b0
[  FAILED  ] graph.gen_binary_graph_with_simple_add (9 ms)
[----------] 2 tests from graph (14 ms total)

[----------] 2 tests from Linear
[ RUN      ] Linear.shape_5_1_fp32
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
Save binary graph for VIPLite. 
network binary graph file has been opened
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_SaveBinaryEntrance[14907]: collect input count=1, output count=1
vxoBinaryGraph_SaveBinaryEntrance[14982]: total operation count=1
generate NBG, device count=1, core count per-device: 1, 
 input table address: 0x44fc7cc0 
 output table address: 0x44fc4ec0 
vxoBinaryGraph_SaveBinaryEntranceExt[14131]: graph input/output=1/1, refine input count=1, output count=1
NBG network name field : dummy_network_name
vxoBinaryGraph_SaveBinaryEntranceExt[14697]: header input count=1, output count=1
generate NGB, save initialize commands
generate NBG, map VIP-SRAM start address=0x400000
generate NBG, patch AXI-SRAM startAddress=0xff000000, endAddress=0xff100000
vxoBinaryGraph_SaveInitialOperation[10003]:fail to search AXI-SRAM address in init command buffer
Dump HEX data size 0x20
0801028A 00000011 08010E13 00000002 08010E21 00220000 3CECD780 00000000
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxoBinaryGraph_SaveBinaryEntrance[15553]: failed to save initial operation
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:55: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_CollectInputAndOutput[13820]: input node param count is bigger than 1018224656 > 5
vxoBinaryGraph_SaveBinaryEntrance[14903]: failed to collect input and output of network
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxProcessGraph[15913]: Process Graph fail!
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:56: Failure
Value of: graph->Run()
  Actual: false
Expected: true
prev_ptrs = 0x3cbb2ec0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:59: Failure
Expected equality of these values:
  golden
    Which is: { -0.5, 1.9, 2, 2.55, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
Exit VX Thread: 0x7ad621b0
[  FAILED  ] Linear.shape_5_1_fp32 (7 ms)
[ RUN      ] Linear.shape_5_1_fp32_omit_b
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
Save binary graph for VIPLite. 
network binary graph file has been opened
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_SaveBinaryEntrance[14907]: collect input count=1, output count=1
vxoBinaryGraph_SaveBinaryEntrance[14982]: total operation count=1
generate NBG, device count=1, core count per-device: 1, 
 input table address: 0x44fc1ec0 
 output table address: 0x44fbecc0 
vxoBinaryGraph_SaveBinaryEntranceExt[14131]: graph input/output=1/1, refine input count=1, output count=1
NBG network name field : dummy_network_name
vxoBinaryGraph_SaveBinaryEntranceExt[14697]: header input count=1, output count=1
generate NGB, save initialize commands
generate NBG, map VIP-SRAM start address=0x400000
generate NBG, patch AXI-SRAM startAddress=0xff000000, endAddress=0xff100000
vxoBinaryGraph_SaveInitialOperation[10003]:fail to search AXI-SRAM address in init command buffer
Dump HEX data size 0x20
0801028A 00000011 08010E13 00000002 08010E21 00220000 3CBAAFB0 00000000
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxoBinaryGraph_SaveBinaryEntrance[15553]: failed to save initial operation
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:86: Failure
Value of: graph->Compile()
  Actual: false
Expected: true
---------------------------Begin VerifyTiling -------------------------
AXI-SRAM = 1048576 Bytes VIP-SRAM = 522240 Bytes SWTILING_PHASE_FEATURES[0, 1, 1]
  0 SH [(   5    1    1 1,       20, 0x0x3cbb0a90(0x0x3cbb0a90, 0x(nil)) ->    5    1    1 1,       20, 0x0x3cbb1280(0x0x3cbb1280, 0x(nil))) k(0 0    0,        0) pad(0 0) pool(0 0, 1 1)]

 id IN [ x  y  w   h ]   OUT  [ x  y  w  h ] (tx, ty, kpc) (ic, kc, kc/ks, ks/eks, kernel_type)
   0 SH DD 0x(nil) [   0    0        0        0] -> DD 0x(nil) [   0    0        0        0] (  0,   0,   0) (       0,        0, 0.000000%, 0.000000%, NONE)

PreLoadWeightBiases = 1048576  100.000000%
---------------------------End VerifyTiling -------------------------
KernelStreamSize: 0x0, statesSize: 0x340, shShareMemSize: 0x0, shIntrSize: 0x0, shParaSize: 0x100, swParaSize: 0x0, lcdTensorSize: 0x0, shaderStatesSize: 0x340, tensorStatic: 0x0
NBG: operationSize: 0x78, nnSize: 0x0, tpSize: 0x0, shSize: 0x4, swSize: 0x0, layerParamSize: 0x0, lcdtSize: 0x50, patchSize: 0x380, lcdSize 0x540
NBG: entranceSize: 0x1f0, nbIOSize: 0xe8, layeSize: 0x4c, sectionsSize: 0x474, inputoutput size: 0x0, InitCommands size: 0x540
NBG: lcdSize: 0x540, headerSize : 0x798
Calculate NBG size : 4888 bytes
vxoBinaryGraph_CollectInputAndOutput[13820]: input node param count is bigger than 1018224656 > 5
vxoBinaryGraph_SaveBinaryEntrance[14903]: failed to collect input and output of network
vxoBinaryGraph_SaveErrorHandle[8965]: failed to save NBG file, remove it, name=network_binary_pid-166964_tid-2098951744.nb
vxProcessGraph[15913]: Process Graph fail!
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:87: Failure
Value of: graph->Run()
  Actual: false
Expected: true
prev_ptrs = 0x3cebfcc0
/media/data/home/leokuo/TIM-VX-1.1.34.fix/src/tim/vx/ops/activations_test.cc:90: Failure
Expected equality of these values:
  golden
    Which is: { -5, -0.2, 0, 1.1, inf }
  output
    Which is: { 0, 0, 0, 0, 0 }
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
Exit VX Thread: 0x7ad621b0
[  FAILED  ] Linear.shape_5_1_fp32_omit_b (9 ms)
[----------] 2 tests from Linear (16 ms total)

[----------] 2 tests from Gelu
[ RUN      ] Gelu.shape_5_1_fp32_approximate
Created VX Thread: 0x7ad621b0
#productname=VIPNano-QI, pid=0x88
prev_ptrs = 0x3cebfcc0
prev_ptrs = 0x3cbb2ec0
prev_ptrs = 0x3cebfcc0
CopyArrayRange from ptr 0x3cf4f7f0 to 0x7fe7fe6a50 from 0 to 1024
CopyArrayRange from ptr 0x3cf2f460 to 0x7fe7fe5a50 from 0 to 1024
hardware doesn't support
W [_setup:243]Call vxTensorTableLookupLayer fail.
Kernel "com.vivantecorp.extension.cl.hard_gelu_F32toF32_2D" does not exist
Segmentation fault

@sunshinemyson
Copy link
Contributor

Did you set other env variable such as VIV_VX_ENABLE_SAVE_NETWORK_BINARY?

@leokuo725
Copy link
Author

leokuo725 commented Oct 18, 2021

@sunshinemyson No.
I just added the following env variable. Should I add VIV_VX_ENABLE_SAVE_NETWORK_BINARY to .bashrc?

export PYTHONPATH=/home/khadas/VeriSilicon-tvm/python:$PYTHONPATH
export LD_LIBRARY_PATH=/home/khadas/TIM-VX-1.1.34.fix/build/install:/home/khadas/VeriSilicon-tvm/build:$LD_LIBRARY_PATH
export VIVANTE_SDK_DIR=/home/khadas/TIM-VX-1.1.34.fix/build/aarch64_A311D_6.4.8

@leokuo725
Copy link
Author

leokuo725 commented Oct 19, 2021

@sunshinemyson Thanks. Now, I can pass the unit_test.
According to TVM VSI Readme, in [start runtime on the target as a service] section(https://github.com/VeriSilicon/tvm/blob/vsi_npu/README.VSI.md#start-runtime-on-the-target-as-a-service).
What path of <path/to/versilicon/driver/sdk> should I set?

@sunshinemyson
Copy link
Contributor

sdk is the root dir of our driver. should have following structure:
sdk/include
/drivers(or lib)/libOpenVX.so
Which could download from the release.

@leokuo725
Copy link
Author

leokuo725 commented Oct 22, 2021

@sunshinemyson
I inferenced model using TVM. And get empty output.
The following is dmesg output.

[  135.267564] npu_version: 2
[  135.268371] galcore irq number is 36.
[  135.268382] Galcore version 6.4.6.2
[  627.735912] [galcore]: GPU[0] hang, automatic recovery.
[  627.748042] ====>>>>npu hardware reset end!
[  627.748196] [galcore]: recovery done
[  689.175159] [galcore]: GPU[0] hang, automatic recovery.
[  689.187353] ====>>>>npu hardware reset end!
[  689.187525] [galcore]: recovery done
[  750.615034] [galcore]: GPU[0] hang, automatic recovery.
[  750.627147] ====>>>>npu hardware reset end!
[  750.627311] [galcore]: recovery done
[  812.054627] [galcore]: GPU[0] hang, automatic recovery.
[  812.067103] ====>>>>npu hardware reset end!
[  812.067282] [galcore]: recovery done
[  873.493411] [galcore]: GPU[0] hang, automatic recovery.
[  873.514460] ====>>>>npu hardware reset end!
[  873.517882] [galcore]: recovery done
[  934.932529] [galcore]: GPU[0] hang, automatic recovery.
[  934.944641] ====>>>>npu hardware reset end!
[  934.944811] [galcore]: recovery done

Then, I cannot rmmod galcore until rebooting.

@sunshinemyson
Copy link
Contributor

@leokuo725 Sorry that we can not give you suggestion about the issue in time. Please let me know if it still an issue?

@gdh1995
Copy link
Contributor

gdh1995 commented Jan 19, 2022

I also ran into this bug.

  • I compiled Tengine-Lite (commit OAID/Tengine@1aea916) with TIM-VX (commit 68b5acb) using gcc-linaro-6.3.1-2017.05-i686-mingw32_aarch64-linux-gnu.
  • the board is Ubuntu 18.04.6 LTS Linux 4.9.241 on A311d
  • aml-npu is 6.4.3CB-3
  • the user guide I read is https://github.com/OAID/Tengine/blob/tengine-lite/doc/docs_zh/source_compile/compile_timvx.md
  • libraries like libOpenVX.so are put in ./3rdparty/tim-vx/lib/aarch64 and downloaded from https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.28/aarch64_A311D_D312513_A294074_R311680_T312233_O312045.tgz
  • when I ran VIV_VX_DEBUG_LEVEL=1 ./tm_benchmark (with context->device = "TIMVX"), it reported:
Tengine-lite library version: 1.5-dev
Created VX Thread: 0x8fbe0150
#productname=VIPNano-QI, pid=0x88
E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
Tengine Fatal: Pre-run subgraph(0) on TIMVX failed.

So what can I do then? Tengine says a "kernel version" of galcore should be >=6.4.4, but I don't know how to read its version.

@leokuo725
Copy link
Author

@gdh1995 You can get the galcore version from dmesg.

@leokuo725
Copy link
Author

@sunshinemyson
I cannot run other model on this board, only support mobilenet v2 uint8.

@gdh1995
Copy link
Contributor

gdh1995 commented Jan 21, 2022

The old galcore is 6.4.3.p0.286725; I've updated it into 6.4.6.2 using rmmod and insmod, but tm_benchmark still reports -1:A generic error code:

E [_graph_optimization_convert_int8_to_uint8:810]CHECK STATUS(-1:A generic error code, used when no other describes the error.)
E [vsi_nn_OptimizeGraph:845]CHECK STATUS(-1:A generic error code, used when no other describes the error.)

I've tried SDK from https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.37/aarch64_A311D_6.4.9.tgz and https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.37/aarch64_S905D3_6.4.9.tgz . No error messages changed.

I also tried v6.4.8 (https://github.com/VeriSilicon/TIM-VX/releases/download/v1.1.34.fix/aarch64_A311D_6.4.8.tgz). The error message is:

E [query_hardware_caps:50]CHECK STATUS(-10:The supplied parameter information does not match the kernel contract.)
E [Init:194]Create tensor fail!

@gdh1995
Copy link
Contributor

gdh1995 commented Jan 21, 2022

Sorry it's a mistake of mine. I ran tengine's benchmark tool with a yolov3_int8 model, but I didn't realize TIM-VX requires uint8. Now yolov3-tiny_uint8.tmfile works well (downloaded from https://github.com/OAID/Tengine/blob/tengine-lite/README_EN.md#model-zoo).

@AddSalt8227
Copy link

@sunshinemyson I saw the same error "PLS isn't existed" on my VIM3:

python3 tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py
X86 Host
INFO:root:{'name': 'mobilenet_v1_1.0_224_quant.tflite', 'shape': (1, 224, 224, 3), 'input_tensor_name': 'input', 'dtype': 'uint8'}
/home/addsalt/data/work/VIM3/tvm_npu/tests/python/contrib/test_vsi_npu/model/mobilenet_v1_1.0_224_quant.tflite
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using softmax.x86 for nn.softmax based on highest priority (10)
INFO:compile_engine:Using injective.cpu for divide based on highest priority (10)
INFO:compile_engine:Using injective.cpu for round based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for reshape based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for multiply based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using pool.cpu for nn.avg_pool2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using depthwise_conv2d_nhwc.generic for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
WARNING:strategy:conv2d NHWC layout is not optimized for x86 with autotvm.
INFO:compile_engine:Using conv2d_nhwc.x86 for nn.conv2d based on highest priority (10)
INFO:compile_engine:Using injective.cpu for add based on highest priority (10)
INFO:compile_engine:Using injective.cpu for fixed_point_multiply based on highest priority (10)
INFO:compile_engine:Using injective.cpu for clip based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for cast based on highest priority (10)
INFO:compile_engine:Using injective.cpu for subtract based on highest priority (10)
INFO:root:#[version = "0.0.5"]
def @main(%input: Tensor[(1, 224, 224, 3), uint8], %v_param_1: Tensor[(3, 3, 3, 32), uint8], %v_param_2: Tensor[(32), int32], %v_param_3: Tensor[(3, 3, 32, 1), uint8], %v_param_4: Tensor[(32), int32], %v_param_5: Tensor[(1, 1, 32, 64), uint8], %v_param_6: Tensor[(64), int32], %v_param_7: Tensor[(3, 3, 64, 1), uint8], %v_param_8: Tensor[(64), int32], %v_param_9: Tensor[(1, 1, 64, 128), uint8], %v_param_10: Tensor[(128), int32], %v_param_11: Tensor[(3, 3, 128, 1), uint8], %v_param_12: Tensor[(128), int32], %v_param_13: Tensor[(1, 1, 128, 128), uint8], %v_param_14: Tensor[(128), int32], %v_param_15: Tensor[(3, 3, 128, 1), uint8], %v_param_16: Tensor[(128), int32], %v_param_17: Tensor[(1, 1, 128, 256), uint8], %v_param_18: Tensor[(256), int32], %v_param_19: Tensor[(3, 3, 256, 1), uint8], %v_param_20: Tensor[(256), int32], %v_param_21: Tensor[(1, 1, 256, 256), uint8], %v_param_22: Tensor[(256), int32], %v_param_23: Tensor[(3, 3, 256, 1), uint8], %v_param_24: Tensor[(256), int32], %v_param_25: Tensor[(1, 1, 256, 512), uint8], %v_param_26: Tensor[(512), int32], %v_param_27: Tensor[(3, 3, 512, 1), uint8], %v_param_28: Tensor[(512), int32], %v_param_29: Tensor[(1, 1, 512, 512), uint8], %v_param_30: Tensor[(512), int32], %v_param_31: Tensor[(3, 3, 512, 1), uint8], %v_param_32: Tensor[(512), int32], %v_param_33: Tensor[(1, 1, 512, 512), uint8], %v_param_34: Tensor[(512), int32], %v_param_35: Tensor[(3, 3, 512, 1), uint8], %v_param_36: Tensor[(512), int32], %v_param_37: Tensor[(1, 1, 512, 512), uint8], %v_param_38: Tensor[(512), int32], %v_param_39: Tensor[(3, 3, 512, 1), uint8], %v_param_40: Tensor[(512), int32], %v_param_41: Tensor[(1, 1, 512, 512), uint8], %v_param_42: Tensor[(512), int32], %v_param_43: Tensor[(3, 3, 512, 1), uint8], %v_param_44: Tensor[(512), int32], %v_param_45: Tensor[(1, 1, 512, 512), uint8], %v_param_46: Tensor[(512), int32], %v_param_47: Tensor[(3, 3, 512, 1), uint8], %v_param_48: Tensor[(512), int32], %v_param_49: Tensor[(1, 1, 512, 1024), uint8], %v_param_50: Tensor[(1024), int32], %v_param_51: Tensor[(3, 3, 1024, 1), uint8], %v_param_52: Tensor[(1024), int32], %v_param_53: Tensor[(1, 1, 1024, 1024), uint8], %v_param_54: Tensor[(1024), int32], %v_param_55: Tensor[(1, 1, 1024, 1001), uint8], %v_param_56: Tensor[(1001), int32]) {
  %0 = qnn.conv2d(%input, %v_param_1, 128, 151, 0.0078125f, 0.0218267f, strides=[2, 2], padding=[0, 0, 1, 1], channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %1 = nn.bias_add(%0, %v_param_2, axis=3);
  %2 = qnn.requantize(%1, 0.000170521f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %3 = qnn.conv2d(%2, %v_param_3, 0, 110, 0.0235285f, 0.292199f, padding=[1, 1, 1, 1], groups=32, channels=32, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %4 = nn.bias_add(%3, %v_param_4, axis=3);
  %5 = qnn.requantize(%4, 0.006875f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %6 = qnn.conv2d(%5, %v_param_5, 0, 121, 0.0235285f, 0.0304209f, padding=[0, 0, 0, 0], channels=64, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %7 = nn.bias_add(%6, %v_param_6, axis=3);
  %8 = qnn.requantize(%7, 0.000715759f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %9 = qnn.conv2d(%8, %v_param_7, 0, 130, 0.0235285f, 0.402773f, strides=[2, 2], padding=[0, 0, 1, 1], groups=64, channels=64, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %10 = nn.bias_add(%9, %v_param_8, axis=3);
  %11 = qnn.requantize(%10, 0.00947663f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %12 = qnn.conv2d(%11, %v_param_9, 0, 104, 0.0235285f, 0.0151482f, padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %13 = nn.bias_add(%12, %v_param_10, axis=3);
  %14 = qnn.requantize(%13, 0.000356414f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %15 = qnn.conv2d(%14, %v_param_11, 0, 160, 0.0235285f, 0.0605373f, padding=[1, 1, 1, 1], groups=128, channels=128, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %16 = nn.bias_add(%15, %v_param_12, axis=3);
  %17 = qnn.requantize(%16, 0.00142435f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %18 = qnn.conv2d(%17, %v_param_13, 0, 94, 0.0235285f, 0.0137555f, padding=[0, 0, 0, 0], channels=128, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %19 = nn.bias_add(%18, %v_param_14, axis=3);
  %20 = qnn.requantize(%19, 0.000323645f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %21 = qnn.conv2d(%20, %v_param_15, 0, 123, 0.0235285f, 0.0167581f, strides=[2, 2], padding=[0, 0, 1, 1], groups=128, channels=128, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %22 = nn.bias_add(%21, %v_param_16, axis=3);
  %23 = qnn.requantize(%22, 0.000394292f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %24 = qnn.conv2d(%23, %v_param_17, 0, 151, 0.0235285f, 0.00760185f, padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %25 = nn.bias_add(%24, %v_param_18, axis=3);
  %26 = qnn.requantize(%25, 0.00017886f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %27 = qnn.conv2d(%26, %v_param_19, 0, 129, 0.0235285f, 0.0410553f, padding=[1, 1, 1, 1], groups=256, channels=256, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %28 = nn.bias_add(%27, %v_param_20, axis=3);
  %29 = qnn.requantize(%28, 0.000965968f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %30 = qnn.conv2d(%29, %v_param_21, 0, 122, 0.0235285f, 0.00643161f, padding=[0, 0, 0, 0], channels=256, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %31 = nn.bias_add(%30, %v_param_22, axis=3);
  %32 = qnn.requantize(%31, 0.000151326f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %33 = qnn.conv2d(%32, %v_param_23, 0, 122, 0.0235285f, 0.0134608f, strides=[2, 2], padding=[0, 0, 1, 1], groups=256, channels=256, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %34 = nn.bias_add(%33, %v_param_24, axis=3);
  %35 = qnn.requantize(%34, 0.000316712f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %36 = qnn.conv2d(%35, %v_param_25, 0, 109, 0.0235285f, 0.00917122f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %37 = nn.bias_add(%36, %v_param_26, axis=3);
  %38 = qnn.requantize(%37, 0.000215785f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %39 = qnn.conv2d(%38, %v_param_27, 0, 132, 0.0235285f, 0.0369348f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %40 = nn.bias_add(%39, %v_param_28, axis=3);
  %41 = qnn.requantize(%40, 0.000869019f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %42 = qnn.conv2d(%41, %v_param_29, 0, 140, 0.0235285f, 0.00530005f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %43 = nn.bias_add(%42, %v_param_30, axis=3);
  %44 = qnn.requantize(%43, 0.000124702f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %45 = qnn.conv2d(%44, %v_param_31, 0, 94, 0.0235285f, 0.0426099f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %46 = nn.bias_add(%45, %v_param_32, axis=3);
  %47 = qnn.requantize(%46, 0.00100255f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %48 = qnn.conv2d(%47, %v_param_33, 0, 127, 0.0235285f, 0.00496329f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %49 = nn.bias_add(%48, %v_param_34, axis=3);
  %50 = qnn.requantize(%49, 0.000116779f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %51 = qnn.conv2d(%50, %v_param_35, 0, 127, 0.0235285f, 0.0283589f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %52 = nn.bias_add(%51, %v_param_36, axis=3);
  %53 = qnn.requantize(%52, 0.000667241f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %54 = qnn.conv2d(%53, %v_param_37, 0, 89, 0.0235285f, 0.0077709f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %55 = nn.bias_add(%54, %v_param_38, axis=3);
  %56 = qnn.requantize(%55, 0.000182837f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %57 = qnn.conv2d(%56, %v_param_39, 0, 134, 0.0235285f, 0.0243294f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %58 = nn.bias_add(%57, %v_param_40, axis=3);
  %59 = qnn.requantize(%58, 0.000572435f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %60 = qnn.conv2d(%59, %v_param_41, 0, 99, 0.0235285f, 0.00965865f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %61 = nn.bias_add(%60, %v_param_42, axis=3);
  %62 = qnn.requantize(%61, 0.000227253f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %63 = qnn.conv2d(%62, %v_param_43, 0, 106, 0.0235285f, 0.0193668f, padding=[1, 1, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %64 = nn.bias_add(%63, %v_param_44, axis=3);
  %65 = qnn.requantize(%64, 0.000455672f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %66 = qnn.conv2d(%65, %v_param_45, 0, 153, 0.0235285f, 0.00544699f, padding=[0, 0, 0, 0], channels=512, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %67 = nn.bias_add(%66, %v_param_46, axis=3);
  %68 = qnn.requantize(%67, 0.000128159f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %69 = qnn.conv2d(%68, %v_param_47, 0, 126, 0.0235285f, 0.00783559f, strides=[2, 2], padding=[0, 0, 1, 1], groups=512, channels=512, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %70 = nn.bias_add(%69, %v_param_48, axis=3);
  %71 = qnn.requantize(%70, 0.00018436f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %72 = qnn.conv2d(%71, %v_param_49, 0, 130, 0.0235285f, 0.00817923f, padding=[0, 0, 0, 0], channels=1024, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %73 = nn.bias_add(%72, %v_param_50, axis=3);
  %74 = qnn.requantize(%73, 0.000192445f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %75 = qnn.conv2d(%74, %v_param_51, 0, 211, 0.0235285f, 0.126169f, padding=[1, 1, 1, 1], groups=1024, channels=1024, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32");
  %76 = nn.bias_add(%75, %v_param_52, axis=3);
  %77 = qnn.requantize(%76, 0.00296857f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %78 = qnn.conv2d(%77, %v_param_53, 0, 95, 0.0235285f, 0.0180482f, padding=[0, 0, 0, 0], channels=1024, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %79 = nn.bias_add(%78, %v_param_54, axis=3);
  %80 = qnn.requantize(%79, 0.000424646f, 0, 0.0235285f, 0, axis=3, out_dtype="uint8");
  %81 = cast(%80, dtype="int32");
  %82 = nn.avg_pool2d(%81, pool_size=[7, 7], strides=[2, 2], padding=[0, 0, 0, 0], layout="NHWC");
  %83 = cast(%82, dtype="uint8");
  %84 = qnn.conv2d(%83, %v_param_55, 0, 74, 0.0235285f, 0.0049866f, padding=[0, 0, 0, 0], channels=1001, kernel_size=[1, 1], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32");
  %85 = nn.bias_add(%84, %v_param_56, axis=3);
  %86 = qnn.requantize(%85, 0.000117327f, 0, 0.166099f, 66, axis=3, out_dtype="uint8");
  %87 = reshape(%86, newshape=[1, 1001]);
  %88 = qnn.dequantize(%87, 0.166099f, 66);
  %89 = nn.softmax(%88, axis=1);
  qnn.quantize(%89, 0.00390625f, 0, out_dtype="uint8")
}

[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:414: name_node.value() == tvmgen_default_vsi_npu_main_0
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:287: Create
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_softmax
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:237: TensorMakerImpl::InferCall: reshape
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_avgpool2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:230: TensorMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_softmax
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:396: GraphMakerImpl::InferCall: reshape
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_avgpool2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
[21:59:17] /home/addsalt/data/work/VIM3/tvm_npu/src/relay/backend/contrib/vsi_npu/codegen.cc:387: GraphMakerImpl::InferCall: vsi_npu.qnn_conv2d
W [HandleLayoutInfer:268]Op 162: default layout inference pass.
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctionget_symbol
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctionget_const_vars
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctionget_const_vars
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:186: SaveToBinary: nbg size = 5676288: input size = 1: output size = 1: output map size =1
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:116: SerializeTensorSpec
[21:59:46] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:116: SerializeTensorSpec
INFO:root:top5 of ref output 0:
INFO:root:283 : 110
INFO:root:282 : 57
INFO:root:286 : 29
INFO:root:464 : 21
INFO:root:264 : 8
INFO:root:top5 of vsi output 0:
INFO:root:1000 : 0
INFO:root:335 : 0
INFO:root:333 : 0
INFO:root:331 : 0
INFO:root:334 : 0
Traceback (most recent call last):
  File "tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py", line 320, in <module>
    test_mobilenet_v1_224_quant()
  File "tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py", line 272, in test_mobilenet_v1_224_quant
    process(model)
  File "tests/python/contrib/test_vsi_npu/test_vsi_tflite_model_all.py", line 267, in process
    assert_allclose(vsi_output[i], ref_output[i], rtol=0, atol=tolerance)
  File "/home/addsalt/data/work/VIM3/tvm_npu/python/tvm/testing/utils.py", line 98, in assert_allclose
    np.testing.assert_allclose(actual, desired, rtol=rtol, atol=atol, verbose=True)
  File "/home/addsalt/anaconda3/envs/tvm-build/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/addsalt/anaconda3/envs/tvm-build/lib/python3.8/site-packages/numpy/testing/_private/utils.py", line 840, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0, atol=5

Mismatched elements: 7 / 1001 (0.699%)
Max absolute difference: 255
Max relative difference: 255.
 x: array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
 y: array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,...
VIM3 Target ( export VIV_VX_DEBUG_LEVEL=1 )
INFO:root:If you are running ROCM/Metal, fork will cause compiler internal error. Try to launch with arg ```--no-fork```
INFO:RPCServer:bind to 0.0.0.0:9090
INFO:RPCServer:connection from ('xxx.xxx.x.xxx', 59296)
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:220: LoadFromBinary: nbg size = 5676288: input size = 1: output size = 1: output_map size = 1
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:148: DeSerializeTensorSpec
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:148: DeSerializeTensorSpec
INFO:RPCServer:load_module /tmp/tmphslhdaoh/model.so
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunction_lookup_linked_param
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:60: GetFunction return early
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:58: GetFunctiontvmgen_default_vsi_npu_main_0
[     1] PLS isn't existed
#productname=VIPNano-QI, pid=0x88
graph gpuCount=1 interConnectRingCount=0
NN ring buffer is disabled
binary graph format version, 0x10014
readBinHeader[1489]: binary version: 0x10014, current version: 0x10011
fail to load binary from pointer to create graph
NBG error, please provide genereating NBG logs first
fail to import kernel from VPMN, error code: -10
E [/home/addsalt/data/work/VIM3/TIM-VX/src/tim/vx/internal/src/vsi_nn_graph.c:compute_node:380]Create node[0] NBG fail
vxProcessGraph[22814]: Process Graph fail!
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:105: operator()0 ms or 19 us
[13:59:51] /home/addsalt/data/work/VIM3/tvm_npu/src/runtime/contrib/vsi_npu/vsi_npu_runtime.cc:107: operator()2
INFO:RPCServer:Finish serving ('xxx.xxx.x.xxx', 59296)

TIM-VX Version is 1.1.37
VIM3 galcore version is 6.4.6.2
TVM is the branch upstream/tvm_npu

Could you help me with it?
If you need more debug messages, please let me know.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants