[UPD]fix coreml error; support swish and groupnorm op (#1738) · ZaoZhe6666/TNN@3a0930f

Commit

[UPD]fix coreml error; support swish and groupnorm op (Tencent#1738)

* [UPD] fix some about error status again

* [UPD]enable const folder to infer blobs shape for coreml; fix reshape shape size logic;

* [UPD]unify op system;check apple neral engine;

* [UPD]unify op system;check apple neral engine;

* [FIX] reset multi input in network forward for support image classifier demo

* [FIX] fix multi input in network forward

* [FIX] fix const op about weight shape(=1)

* [FIX] fix const op about weight shape(=1) again

* [UPD] update to support multi output forward

* [UPD] update to support split op

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix coreml multi output case; add cache logic;

* [UPD]fix multi output error

* [FIX] fix pool op about pad

* [UPD] update to support pad op (only allowed for H and W dimensions)

* [UPD]remove blob manager of coreml network

* [UPD]rename coreml_executor to coremlmodel

* [UPD] remove InitCoreMLExecutor

* [FIX] fix to support different input data type (float32 & int32) in forward

* [UPD] update to support expand dims & reduce dims reshape by adding unsqueeze & squeeze

* [UPD]change internal device from metal to arm for device npu

* [FIX] fix conv op about group conv

* [FIX] fix deconv op about group deconv

* [UPD] update to support sub op

* [UPD] update to support clip op

* [UPD] update to support slice op

* [UPD] update to support upsample op

* [FIX] fix slice op about endindex

* [UPD] update to support constant padding, allowed for C , H and W dimensions

* [UPD]fix camera switch device

* [UPD]fix actual device display error

* [UPD]fix cache path

* [UPD] upodate to add sub & slice & clip to project

* [FIX] fix demo use NPU error

* [UPD]fix ocr error

* [FIX] fix upsample op about align_corners

* [FIX] fix upsample op about Fractional scales

* [BUG]fix coreml output nil error; fix upsample nn for fractional scale

* [FIX] fix upsample op about scales order

* [UPD] update to support slice v2 op

* [UPD] update to support tanh v2 op

* [FIX] fix batchnorm op about mean value

* [FIX] fix some annotation

* [BUG]fix upsample error; add shuffle channel coreml layer

* [FIX] fix innerproduct op about inputchannels

* [UPD] remove slicev2 to slice file

* [UPD] remove tanhv2 to slice file

* [UPD] update to reshape op about expand dims & reduce dims

* [UPD] update to innerproduct op adout adding squeeze to reduce dims (in order to match old TNN model)

* [UPD] update to support flatten to 2D op

* [UPD] update to support relu6 op

* [ADD]]add cast coreml layer

* [ADD]]add shape coreml layer

* [UPD] add flatten & relu6 & shuffle_channel to xcode project

* [ADD]]add gather coreml layer

* [ADD]]add gelu coreml layer

* [ADD]]add layernorm coreml layer

* [BUG]support int32 for coreml const layer

* [BUG]support shape input for coreml reshape layer

* [BUG]support model check for TNN_APPLE_NPU_ENABLE using MLComputeUnitsCPUOnly

* [ADD]]add mat_mul coreml layer;

* [UPD] update to support reshape layer when reshape_type = 1

* [UPD] update to coreml model input&output support int32 data tpye

* [FIX] fix reshape layer about reshapedynamic input & output

* [BUG]support mlmodel and mlmodelc for benchmark

* [UPD] update to support conv layer with fp16 data type

* [FIX] add 'APPLE_NPU' to model_check device_type_message

* [FIX] fix some about conv layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [FIX] fix some about const layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support deconv layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support innerproduct layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support batchnorm layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support layernorm layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support prelu layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD] update to support matmul layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [FIX] fix some about matmul layer with fp16 data type (TNN fp16 -> CoreML fp32)

* [UPD]support fuse form mul+add to batchnorm

* [BUG]fix import error

* [BUG]fix reshape error

* [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4)

* [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4)

* [UPD]support ssd

* [ADD]]ssdlite-mobilenetv2 from tf

* [UPD] update to support conv & deconv & const & innerproduct & batchnorm & layernorm & matmul & prelu layers with fp16 data type (TNN fp16 -> CoreML fp16)

* [UPD] update to support batchnorm layers with fp16 data type (TNN fp16 -> CoreML fp16)

* [FIX] set coreml layer default using full precision

* [UPD] update to support hardsigmoid layer

* [UPD] update to support hardswish layer

* [UPD] update to support reducesum layer

* [UPD] update to support reducemean layer

* [UPD] add some coreml layer files to xcode project

* [FIX] fix some annotation about hardswish

* [BUG]fix reshape for tensor with dims size=0

* [UPD]support landscapeleft ui; clear navbar left items

* [UPD]support landscapeleft ui; add stackview to support minor camera preview;

* [ADD]add monodepth demo

* [UPD] update to support unit_test

* [FIX] upload missing download_model.sh and download_model.bat

* [UPD] update concat & conv & shuffle uint_test files for APPLE_NPU

* [FIX] rename unit_test model

* [UPD] update to support softplus layer

* [UPD] update to support softsign layer

* [UPD] update to support div layer

* [UPD] update binary layer unit_test for APPLE_NPU

* [UPD] update to support reducemax layer

* [UPD] update to support reducemin layer

* [UPD]update project file

* [UPD]add log error

* [UPD] update hardswish layer unit_test for APPLE_NPU

* [UPD]add log error

* [UPD] update to skip stride_slice when APPLE_NPU

* [BUG]fix batchnorm unitest

* [BUG]fix prelu unitest

* [BUG]fix prelu unitest

* [BUG]fix prelu unitest

* [BUG] fix unsqueeze unittest

* [BUG] fix split unittest

* [BUG] fix reshape unittest

* [BUG]fix updample unitest

* [BUG] fix reduce op (reducesum/reducemean/reducemax/reducemin) unittest

* [BUG]fix layernorm unitest

* [BUG] fix reduce op unittest again

* [BUG] fix deconv unittest

* [BUG] fix innerproduct unittest

* [BUG]fix ssd demo display error

* [BUG] fix matmul unittest

* [BUG]fix benchmark error to support multiple model in the same directory

* [BUG] add some explanation about reduce op unittest

* [BUG]fix benchmark error to support multiple model in the same directory

* [BUG] add some explanation about reduce op unittest again

* [BUG]fix batchnorm param error

* [BUG] fix reshape layer unittest

* [BUG]fix batchnorm param error

* [BUG]fix conv/deconv input/output channel error

* [UPD] update to support stride_slice & unittest

* [BUG] fix reshape layer unittest when reshape_type = 1

* [BUG] fix reshape layer unittest when reshape_type = 1 using reshapestatic

* [BUG] fix reshape layer unittest using reshapestatic

* [BUG] fix some annotation about reshape layer

* [BUG] fix reshape layer output permute when reshape_type = 1

* [BUG] fix reshape layer using reshapestatic whem reshape_type = 1

* [BUG]fix broadcast layer error for input form constant map； fix bert demo error；

* [BUG]fix blob convert error for int32 mat

* [BUG]fix reshape name style

* [UPD]add tiny bert fixed length 256

* [BUG] fix add layer by binary op base class

* [BUG] fix div/mul/sub layer by binary op base class

* [BUG]fix batchnorm unitest

* [BUG]ensure clean up mlmodelc if error raises when compile

* [UPD]adjust demo list

* [BUG] fix conv layer about activation inplace

* [BUG] fix conv layer about relu6

* [BUG] fix cleanup func none of return

* [BUG] remove repetitive line

* [BUG]fix batchnorm unitest

* [BUG] fix conv layer about relu6 inplace

* [UPD]automatically use apple npu

* [UPD]add clean logic for coreml

* [BUG] fix hardswish layer with 2 inputs

* [UPD] update README.md & support.md about APPLE_NPU

* [UPD]unify rawbuffer2coremlweight

* [UPD]support coreml lstm

* [UPD]fix lstm error

* [UPD]support coreml lstm bidirection

* [UPD]support coreml constofshape

* [UPD]support slice at axis=0

* [UPD]ignore

* [UPD]fix reshape error

* [UPD]fix lstm error; replace suqeeze with reshape because some case suqeeze raise runtime compile error for axis = {3, 4}

* [UPD]fix slice error

* [UPD]support multiple mlmodel in the same dirctory; add autorelease memory, because coreml may need large memory in ocr demo

* ignore

* [UPD]add log msg

* [UPD]fix reshape and slice error

* [UPD]add auto release to model

* [UPD]add auto release to model

* [UPD]unify convertion from rawbuffer to coreml weight param

* [FIX] fix matmul from rawbuffer to coreml weight param

* [UPD]fix innerproduct input channel error

* [BUG] fix matmul weight bug

* remove some annotation

* [BUG] fix matmul layer about fp16

* [FIX] fix sliceV2 op  conflict with master

* [FIX] fix sliceV2 op  conflict with master

* merge master (Tencent#1721)

* Fix trt multistream logger (Tencent#1521)

* [FIX] fix trt logger

* [FIX] catch std::bad_alloc error for trt8 building

* [FIX] return null while shape_tensor size -1

* Update version.h

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update split_utils.cc (Tencent#1528)

我使用mingw32编译提示错误，因为使用mingw32编译器仍然需要空间命名
[ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj
D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)':
D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope
             int len = min((i - cursor), subs_length - 1);
个人认为修改这样更好一下，可以适应mingw32和兼顾之前的编译器

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update README.md (Tencent#1538)

Typos

* [UPD]update QQ group (Tencent#1552)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [opencl][fix] try save program cache (Tencent#1557)

* Dev roi align (Tencent#1511)

* [ARM] fix int32 blob cvt to mat

* [ARM] support roi align

* [ARM] add roi align unit test

* [ARM] add to xcodeproj

Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Fix arm gather and constant blob (Tencent#1564)

* [ARM][BUG] fix gather error for indice < 0

* [ARM][BUG] fix buffer to blob error without converting precision

* [ARM] update type convert in layer_norm fp16

Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>

* Dev add config layer (Tencent#1569)

* add config layer param to set arm conv algorithm for specific layer

Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>

* 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571)

* [ONNX][BUG]1. fix compile bug;

* [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题;

* [ADD][TOOLS] add dynamic range quantization (Tencent#1572)

* [ADD][TOOLS] support fake quantization

* [UPD][FAKE_QUANT] fix bug

* [UPD][DOC] add fake quantization in doc

* [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer

* [UPD] remove redundant comment

* [UPD] update comment for DynamicRangeDequant

* [DRQuant][UPD] fix namespace issue

* [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci

Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585)

Co-authored-by: ealinli <ealinli@tencent.com>

* [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Bugfix from train branch (Tencent#1592)

* [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc.
* [BUG] fix Convert from NCHW to NHWC error when input is on arm device.
* [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device.
* [BUG] fix tflite_converter bug when transform a activation layer.
* add nchw format condition when copy int32 mat to blob
* rollback changes on tflite_op_converter.cc

Co-authored-by: sanerzheng <sanerzheng@tencent.com>

* [UPD][OPENCL] opencl support x86 mat (Tencent#1593)

Co-authored-by: ealinli <ealinli@tencent.com>

* [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596)

* [UPD][OPENCL] add ocl version check (Tencent#1601)

* [UPD][OPENCL] add ocl version check

* [UPD][OPENCL] update message for vervion check

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604)

Co-authored-by: ealinli <ealinli@tencent.com>

* [DOC][UPD] modify image links in doc (Tencent#1617)

Co-authored-by: ealinli <ealinli@tencent.com>

* remove redundant test cases (Tencent#1614)

* Fix typos. (Tencent#1626)

* Fix typos.

* Update Readme.

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636)

* [UPD][OPENCL] get opencl version when GpuType is OTHER

* [UPD][OPENCL] optimize nv gpu judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* Patch x86 avx support (Tencent#1633)

* merge dev_vc14_m1_debug, support x86 avx

* add option to support x86 avx2 compile

* update win_x86_opencl building script

Co-authored-by: Dandiding <Dandiding@tencent.com>

* fix x86 avx2 options (Tencent#1638)

* fix typos in doc (Tencent#1634)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [X86][BUG] fix deconv layer build error (Tencent#1641)

* [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646)

* [OPENCL][UPD] fix deconv and avgpool when read image

* [OPENCL][UPD] add header file for pooling

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] opencl support cache on windows (Tencent#1645)

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

* [OPENCL][UPD] support cache on windows

* [OPENCL][UPD] fix load cache on windows

Co-authored-by: ealinli <ealinli@tencent.com>

* [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647)

* [DRQ][UPD] dynamic range quant model support do const folder

* [TOOLS][UPD] dynamic range quant updates usage

Co-authored-by: ealinli <ealinli@tencent.com>

* 1. make model_check support dynamic range quantized model; (Tencent#1653)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial

* [TUTORIAL][UPD] update code link

* [TUTORIAL][UPD] fix typo

Co-authored-by: ealinli <ealinli@tencent.com>

* [X86][FIX] binary op support fp16 weights (Tencent#1655)

* [X86][FIX] binary op support fp16 weights

* [X86][FIX] matmul support fp16 weights

Co-authored-by: ealinli <ealinli@tencent.com>

* Feature dynamic quant fc (Tencent#1660)

* [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer;

* [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差;

* [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663)

* [FIX] Fix CPU Not Operator data type error.

* [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug

* fix _mm256_load_ps segmentation fault (Tencent#1682)

* fix _mm256_load_ps segmentation fault

* fix crash on mm256_load when  innerproduct

* use loadu instead of stride-judgement

* remove unused code

Co-authored-by: fishdai <fishdai@tencent.com>

* x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684)

* Dev x86 layer adapter (Tencent#1683)

* [X86] add layer acc adapter

* [X86] NULL to nullptr

* [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder

* [X86][OPENVINO] fix hard code of ov precision

Co-authored-by: anonymous <anonymous@mail.org>

* [ARM] fix arm cross compile error caused by float-abi (Tencent#1678)

* avoid nullptr in IsSupport (Tencent#1685)

* [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686)

Co-authored-by: ealinli <ealinli@tencent.com>

* Dev metal ngray (Tencent#1693)

* [METAL] metal support ngray input mat

* [METAL]fix bytes_size

* [COREML] fix dynamic quantization model about coreml

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698)

* [UPD][DRQ] support quantizing matmul's const weight

* [UPD][DRQ] add scale check in constant map

Co-authored-by: ealinli <ealinli@tencent.com>

* [FIX] fix compile macos framework (Tencent#1687)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* Optimize dynamic range quantize (Tencent#1699)

* [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑;

* [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model;

* [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码;

* [DRQ][UPD]1.fix conflict with merge master code;

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* Fix windows x86 build (Tencent#1697)

* [FIX] remove nanodet for windows

* remove ninga compile for some bug

* fix x86 mat type register macro name

* fix x86 matmul with 2 inputs

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [METAL] fix stride slice crach when dims is 2 (Tencent#1701)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash).  3. Use ios project build/profile M1-Mac. (Tencent#1700)

Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [iOS][UPD]1. add missing file for xcode project; (Tencent#1705)

* [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]update merge logic for swish groupnorm deconv (Tencent#1708)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

* [UPD]support fusion for deconv+add and deconv+add+bn

* [UPD]add aliyun disk link for tnn models

* [UPD]support fusion for group norm

* [UPD]support fusion for swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [DRQ][BUG]1. fix bug for max_values; (Tencent#1716)

* Hotfix m1 build (Tencent#1715)

* fix apple m1 clang 13.1 compile error

* fix unit test compile error

Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>

* [FIX] fix sliceV2 op  conflict with master again

* [METAL][OP][FIX] 1.metal support groupnorm & swish op 2.fix metal blob conveter & reformat bug when input dim is 1

* reset model

* [COREML] coreml support swish op

* [COREML] fix coreml batchnorn bug

* [COREML]coreml support groupmorm

* [COREML]coreml support instancenorm

* reset model

* solve conflict

* solve conflict

* Dev groupnorm (Tencent#1726)

* Fix trt multistream logger (Tencent#1521)

* [FIX] fix trt logger

* [FIX] catch std::bad_alloc error for trt8 building

* [FIX] return null while shape_tensor size -1

* Update version.h

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update split_utils.cc (Tencent#1528)

我使用mingw32编译提示错误，因为使用mingw32编译器仍然需要空间命名
[ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj
D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)':
D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope
             int len = min((i - cursor), subs_length - 1);
个人认为修改这样更好一下，可以适应mingw32和兼顾之前的编译器

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Update README.md (Tencent#1538)

Typos

* [UPD]update QQ group (Tencent#1552)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [opencl][fix] try save program cache (Tencent#1557)

* Dev roi align (Tencent#1511)

* [ARM] fix int32 blob cvt to mat

* [ARM] support roi align

* [ARM] add roi align unit test

* [ARM] add to xcodeproj

Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Fix arm gather and constant blob (Tencent#1564)

* [ARM][BUG] fix gather error for indice < 0

* [ARM][BUG] fix buffer to blob error without converting precision

* [ARM] update type convert in layer_norm fp16

Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>

* Dev add config layer (Tencent#1569)

* add config layer param to set arm conv algorithm for specific layer

Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>

* 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571)

* [ONNX][BUG]1. fix compile bug;

* [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题;

* [ADD][TOOLS] add dynamic range quantization (Tencent#1572)

* [ADD][TOOLS] support fake quantization

* [UPD][FAKE_QUANT] fix bug

* [UPD][DOC] add fake quantization in doc

* [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer

* [UPD] remove redundant comment

* [UPD] update comment for DynamicRangeDequant

* [DRQuant][UPD] fix namespace issue

* [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci

Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585)

Co-authored-by: ealinli <ealinli@tencent.com>

* [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Bugfix from train branch (Tencent#1592)

* [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc.
* [BUG] fix Convert from NCHW to NHWC error when input is on arm device.
* [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device.
* [BUG] fix tflite_converter bug when transform a activation layer.
* add nchw format condition when copy int32 mat to blob
* rollback changes on tflite_op_converter.cc

Co-authored-by: sanerzheng <sanerzheng@tencent.com>

* [UPD][OPENCL] opencl support x86 mat (Tencent#1593)

Co-authored-by: ealinli <ealinli@tencent.com>

* [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596)

* [UPD][OPENCL] add ocl version check (Tencent#1601)

* [UPD][OPENCL] add ocl version check

* [UPD][OPENCL] update message for vervion check

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602)

Co-authored-by: ealinli <ealinli@tencent.com>

* [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604)

Co-authored-by: ealinli <ealinli@tencent.com>

* [DOC][UPD] modify image links in doc (Tencent#1617)

Co-authored-by: ealinli <ealinli@tencent.com>

* remove redundant test cases (Tencent#1614)

* Fix typos. (Tencent#1626)

* Fix typos.

* Update Readme.

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636)

* [UPD][OPENCL] get opencl version when GpuType is OTHER

* [UPD][OPENCL] optimize nv gpu judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* Patch x86 avx support (Tencent#1633)

* merge dev_vc14_m1_debug, support x86 avx

* add option to support x86 avx2 compile

* update win_x86_opencl building script

Co-authored-by: Dandiding <Dandiding@tencent.com>

* fix x86 avx2 options (Tencent#1638)

* fix typos in doc (Tencent#1634)

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>

* [X86][BUG] fix deconv layer build error (Tencent#1641)

* [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646)

* [OPENCL][UPD] fix deconv and avgpool when read image

* [OPENCL][UPD] add header file for pooling

Co-authored-by: ealinli <ealinli@tencent.com>

* [OPENCL][UPD] opencl support cache on windows (Tencent#1645)

* [UPD][OPENCL] add coor check for conv and dwconv

* [OPENCL][FIX] fix compilation issues

* [OPENCL][UPD] optimize AMD GPU judgment logic

* [OPENCL][UPD] support cache on windows

* [OPENCL][UPD] fix load cache on windows

Co-authored-by: ealinli <ealinli@tencent.com>

* [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647)

* [DRQ][UPD] dynamic range quant model support do const folder

* [TOOLS][UPD] dynamic range quant updates usage

Co-authored-by: ealinli <ealinli@tencent.com>

* 1. make model_check support dynamic range quantized model; (Tencent#1653)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640)

* [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial

* [TUTORIAL][UPD] update code link

* [TUTORIAL][UPD] fix typo

Co-authored-by: ealinli <ealinli@tencent.com>

* [X86][FIX] binary op support fp16 weights (Tencent#1655)

* [X86][FIX] binary op support fp16 weights

* [X86][FIX] matmul support fp16 weights

Co-authored-by: ealinli <ealinli@tencent.com>

* Feature dynamic quant fc (Tencent#1660)

* [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer;

* [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差;

* [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663)

* [FIX] Fix CPU Not Operator data type error.

* [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug

* fix _mm256_load_ps segmentation fault (Tencent#1682)

* fix _mm256_load_ps segmentation fault

* fix crash on mm256_load when  innerproduct

* use loadu instead of stride-judgement

* remove unused code

Co-authored-by: fishdai <fishdai@tencent.com>

* x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684)

* Dev x86 layer adapter (Tencent#1683)

* [X86] add layer acc adapter

* [X86] NULL to nullptr

* [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder

* [X86][OPENVINO] fix hard code of ov precision

Co-authored-by: anonymous <anonymous@mail.org>

* [ARM] fix arm cross compile error caused by float-abi (Tencent#1678)

* avoid nullptr in IsSupport (Tencent#1685)

* [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686)

Co-authored-by: ealinli <ealinli@tencent.com>

* Dev metal ngray (Tencent#1693)

* [METAL] metal support ngray input mat

* [METAL]fix bytes_size

* [COREML] fix dynamic quantization model about coreml

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698)

* [UPD][DRQ] support quantizing matmul's const weight

* [UPD][DRQ] add scale check in constant map

Co-authored-by: ealinli <ealinli@tencent.com>

* [FIX] fix compile macos framework (Tencent#1687)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* Optimize dynamic range quantize (Tencent#1699)

* [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑;

* [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model;

* [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码;

* [DRQ][UPD]1.fix conflict with merge master code;

Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* Fix windows x86 build (Tencent#1697)

* [FIX] remove nanodet for windows

* remove ninga compile for some bug

* fix x86 mat type register macro name

* fix x86 matmul with 2 inputs

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [METAL] fix stride slice crach when dims is 2 (Tencent#1701)

Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>

* [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash).  3. Use ios project build/profile M1-Mac. (Tencent#1700)

Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [iOS][UPD]1. add missing file for xcode project; (Tencent#1705)

* [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [UPD]update merge logic for swish groupnorm deconv (Tencent#1708)

* [BUG]fix YouTu face alignment model

* [UPD]update mean pts file logic

* [UPD]draw face points green

* [UPD]unify example controller list

* [UPD]unify example controller list

* [UPD]move blaze anchor file to resource

* [METAL]update tnn project

* [UPD]update tool onnx2coreml

* [ADD]support ShareCommandQueue between instances

* [ADD]support ShareCommandQueue between instances

* [UPD]add log message

* [UPD]transfer file half.hpp

* [UPD]fix xcode compile error with fp16

* [UPD]fix xcode compile error with fp16

* [UPD]update model type erro msg

* [FIX]fix logic error of constofshape

* [UPD]update debug message

* [FIX]fsupport int32 for neg op

* [BUG]fix init error with nil commadbuffer

* [UPD]add mac build xcode project; fix ios mac build script;

* [UPD]add mac build xcode project; fix ios mac build script;

* [ADD]add QQ group 2 of TNN

* [BUG]fix dynamic dequant error; fix arm pad error;

* [BUG]support coreml padv2

* [BUG]fix ccoreml matmul error when it has const input blob

* [BUG]fix coreml slicev2

* [UPD]add convert logic of swish

* [BUG]fix  error cpu error for x86 mac

* [UPD]support fusion for gemm + bn

* [UPD]add convert logic of swish

* [UPD]support fusion for deconv+add and deconv+add+bn

* [UPD]add aliyun disk link for tnn models

* [UPD]support fusion for group norm

* [UPD]support fusion for swish

Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>

* [DRQ][BUG]1. fix bug for max_values; (Tencent#1716)

* Hotfix m1 build (Tencent#1715)

* fix apple m1 clang 13.1 compile error

* fix unit test compile error

Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>

* [ARM] support groupnorm

* [ARM] support swish

* add swish to conv-post-fuse

* [ADD][OPENCL] opencl add group norm and swish (Tencent#1722)

Co-authored-by: ealinli <ealinli@tencent.com>

* add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler

Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>

* fix coreml groupnorm unit test

* [ADD]add exp op

* [BUG]fix deconv bisas error

* [UPD]init cpu memory with 0 for bert model

* [BUG]fix reshape static error; reshape static layer cannot handle 0 or -1

* [UPD]support inst norm for coreml; update tnn project file;

* [BUG]fix error for layer without layer resource, [] operater will add one, which is not thread safe

* [UPD]add param to batchnorm to support instancenorm

* [UPD]adjust groupnorm with batchnorm

* [UPD]support instancenorm with groupnorm by setting group==channels

* [UPD]update unit test of instancenorm

* [BUG]fix unit test error for layer batchnorm

* [UPD]update tnn project

* [BUG]fix unit test error for APPLE NPU

* [BUG]fix unit test crash for layer batchnorm

* [UPD]ignore cpu or gpu benchmark for mlmodel or mlmodelc

* [UPD]ignore

* [UPD]ignore pixelshuffle for apple npu

* [UPD]ignore matconvert for apple npu

* [UPD]ignore some unary op for apple npu

* [UPD]unify before and after coreml layer, simplify lstm layer

* [UPD]fix lstm error for ht and ct for biLSTM

* [UPD]fix const input load error

* [UPD]fix internal error

* [UPD]ignore

Co-authored-by: jacinhu <jacinhu@tencent.com>
Co-authored-by: teslawho <597645882@qq.com>
Co-authored-by: teslawho <71381575+teslawho@users.noreply.github.com>
Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com>
Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com>
Co-authored-by: sxj731533730 <sxj731533730@gmail.com>
Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com>
Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com>
Co-authored-by: lucasktian <lucasktian@tencent.com>
Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com>
Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com>
Co-authored-by: powerpwang <powerpwang@outlook.com>
Co-authored-by: ealinli <ealinli@tencent.com>
Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com>
Co-authored-by: saner zheng <zqawszqaws@126.com>
Co-authored-by: sanerzheng <sanerzheng@tencent.com>
Co-authored-by: Feng Shijie <j514681085@icloud.com>
Co-authored-by: Dandiding <Dandiding@tencent.com>
Co-authored-by: FeiGeChuanShu <774074168@qq.com>
Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com>
Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com>
Co-authored-by: kumbayaco <xyu.dai@gmail.com>
Co-authored-by: fishdai <fishdai@tencent.com>
Co-authored-by: anonymous <anonymous@mail.org>
Co-authored-by: XDC <196890111@qq.com>
Co-authored-by: gennyxu <gennyxu@tencent.com>
Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local>
Co-authored-by: quinnrong <quinnrong@tencent.com>
Co-authored-by: shenpenwang <565067453@qq.com>

Loading branch information

30 people authored and zezhao(赵泽) committed Aug 2, 2022

1 parent aff0883 commit 3a0930f

benchmark/benchmark_ios/benchmark/BenchmarkController.mm

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -161,17 +161,22 @@ - (void)viewDidLoad {
  
               }

               NSArray<NSString *> *coremls = [[modelFiles filteredArrayUsingPredicate:predicateCoreML] sortedArrayUsingComparator:sort];

               if (coremls.count > 0) {

               for (NSString *iter in coremls) {

                   auto proto_prefix = [iter substringToIndex:iter.length - @".mlmodel".length];

                   model.name = proto_prefix.UTF8String;

                   model.tnn_proto_content = "";

                   model.tnn_model_content = "";

                   model.coreml = [modelDirPath stringByAppendingPathComponent:coremls[0]].UTF8String;

                   model.coreml = [modelDirPath stringByAppendingPathComponent:iter].UTF8String;

                   netmodels.push_back(model);

               }

               coremls = [modelFiles filteredArrayUsingPredicate:predicateCoreMLC];

               if (coremls.count > 0) {

               for (NSString *iter in coremls) {

                   auto proto_prefix = [iter substringToIndex:iter.length - @".mlmodelc".length];

                   model.name = proto_prefix.UTF8String;

                   model.tnn_proto_content = "";

                   model.tnn_model_content = "";

                   model.coreml = [modelDirPath stringByAppendingPathComponent:coremls[0]].UTF8String;

                   model.coreml = [modelDirPath stringByAppendingPathComponent:iter].UTF8String;

                   netmodels.push_back(model);

               }

           }

    @@ -190,6 +195,7 @@ - (IBAction)onBtnBenchmark:(id)sender {
  
        option.warm_count = 5;

        option.forward_count = 10;

        option.create_count = 1;

        option.create_count = 2;

        //Get metallib path from app bundle

        //PS：A script(Build Phases -> Run Script) is added to copy the metallib file in tnn framework project to benchmark app

    @@ -200,48 +206,54 @@ - (IBAction)onBtnBenchmark:(id)sender {
  
        NSString *allResult = [NSString string];

        for (auto model : allModels) {

            NSLog(@"model: %s", model.name.c_str());

            allResult = [allResult stringByAppendingFormat:@"model: %s\n", model.name.c_str()];

            //benchmark on cpu

            auto result_cpu = [self benchmarkWithProtoContent:model.tnn_proto_content

                                                    model:model.tnn_model_content

                                                   coreml:model.coreml

                                                  library:pathLibrary.UTF8String

                                                  netType:NETWORK_TYPE_DEFAULT

                                                  deviceType:DEVICE_ARM

                                                   option:option];

            NSLog(@"cpu: \ntime: %s", result_cpu.description().c_str());

            allResult = [allResult stringByAppendingFormat:@"cpu: \ntime: %s\n",

                         result_cpu.description().c_str()];

            //benchmark on gpu

            auto result_gpu = [self benchmarkWithProtoContent:model.tnn_proto_content

                                                    model:model.tnn_model_content

                                                   coreml:model.coreml

                                                  library:pathLibrary.UTF8String

                                                  netType:NETWORK_TYPE_DEFAULT

                                                  deviceType:DEVICE_METAL

                                                   option:option];

            NSLog(@"gpu: \ntime: %s", result_gpu.description().c_str());

            allResult = [allResult stringByAppendingFormat:@"gpu: \ntime: %s\n",

                         result_gpu.description().c_str()];

            @autoreleasepool {

                NSLog(@"model: %s", model.name.c_str());

                allResult = [allResult stringByAppendingFormat:@"model: %s\n", model.name.c_str()];

                //tnn proto and model

                if (model.tnn_proto_content.length() > 0 && model.tnn_model_content.length() > 0) {

                    //benchmark on cpu

                    auto result_cpu = [self benchmarkWithProtoContent:model.tnn_proto_content

                                                            model:model.tnn_model_content

                                                           coreml:model.coreml

                                                          library:pathLibrary.UTF8String

                                                          netType:NETWORK_TYPE_DEFAULT

                                                          deviceType:DEVICE_ARM

                                                           option:option];

                    NSLog(@"cpu: \ntime: %s", result_cpu.description().c_str());

                    allResult = [allResult stringByAppendingFormat:@"cpu: \ntime: %s\n",

                                 result_cpu.description().c_str()];

            //benchmark on npu

            auto result_npu = [self benchmarkWithProtoContent:model.tnn_proto_content

                                                    model:model.tnn_model_content

                                                   coreml:model.coreml

                                                  library:pathLibrary.UTF8String

                                                  netType:NETWORK_TYPE_COREML

                                                  deviceType:DEVICE_APPLE_NPU

                                                   option:option];

            NSLog(@"npu: \ntime: %s", result_npu.description().c_str());

            allResult = [allResult stringByAppendingFormat:@"npu: \ntime: %s\n",

                         result_npu.description().c_str()];

                    //benchmark on gpu

                    auto result_gpu = [self benchmarkWithProtoContent:model.tnn_proto_content

                                                            model:model.tnn_model_content

                                                           coreml:model.coreml

                                                          library:pathLibrary.UTF8String

                                                          netType:NETWORK_TYPE_DEFAULT

                                                          deviceType:DEVICE_METAL

                                                           option:option];

                    NSLog(@"gpu: \ntime: %s", result_gpu.description().c_str());

                    allResult = [allResult stringByAppendingFormat:@"gpu: \ntime: %s\n",

                                 result_gpu.description().c_str()];

                }

                //tnn proto and model pr coreml model

                //benchmark on npu

                auto result_npu = [self benchmarkWithProtoContent:model.tnn_proto_content

                                                        model:model.tnn_model_content

                                                       coreml:model.coreml

                                                      library:pathLibrary.UTF8String

                                                      netType:NETWORK_TYPE_COREML

                                                      deviceType:DEVICE_APPLE_NPU

                                                       option:option];

                NSLog(@"npu: \ntime: %s", result_npu.description().c_str());

                allResult = [allResult stringByAppendingFormat:@"npu: \ntime: %s\n",

                             result_npu.description().c_str()];

            }

            self.textViewResult.text = allResult;

        }

        self.textViewResult.text = allResult;

    }

    - (BenchResult)benchmarkWithProtoContent:(string)protoContent

    @@ -293,10 +305,12 @@ - (BenchResult)benchmarkWithProtoContent:(string)protoContent
  
        //warm cpu, only used when benchmark

        for (int cc=0; cc<option.warm_count; cc++) {

            result.status = instance->Forward();

            if (result.status != TNN_OK) {

                NSLog(@"instance.Forward Error: %s", result.status.description().c_str());

                return result;

            @autoreleasepool {

                result.status = instance->Forward();

                if (result.status != TNN_OK) {

                    NSLog(@"instance.Forward Error: %s", result.status.description().c_str());

                    return result;

                }

            }

        }

    @@ -309,14 +323,16 @@ - (BenchResult)benchmarkWithProtoContent:(string)protoContent
  
        }

    #endif

        for (int cc=0; cc<option.forward_count; cc++) {

            timeval tv_begin, tv_end;

            gettimeofday(&tv_begin, NULL);

            result.status = instance->Forward();

            gettimeofday(&tv_end, NULL);

            double elapsed = (tv_end.tv_sec - tv_begin.tv_sec) * 1000.0 + (tv_end.tv_usec - tv_begin.tv_usec) / 1000.0;

            result.addTime(elapsed);

            @autoreleasepool {

                timeval tv_begin, tv_end;

                gettimeofday(&tv_begin, NULL);

                result.status = instance->Forward();

                gettimeofday(&tv_end, NULL);

                double elapsed = (tv_end.tv_sec - tv_begin.tv_sec) * 1000.0 + (tv_end.tv_usec - tv_begin.tv_usec) / 1000.0;

                result.addTime(elapsed);

            }

        }

    #if TNN_PROFILE

        if (profile_layer_time) {

include/tnn/core/macro.h

-Original file line number
+Diff line change
@@ Expand Up / @@ -213,7 +213,8 @@ @@
     #define CHECK_PARAM_NULL(param)                                                   \
         do {                                                                                                         \
             if (!param) {                                                                                        \
-                return Status(TNNERR_PARAM_ERR, "Error: param is nil");                                                    \
+                LOGE("Error: param is nil\n");                                                       \
+                return Status(TNNERR_PARAM_ERR, "Error: param is nil");       \
             }                                                                                                          \
         } while (0)
@@ Expand Down @@

0 comments on commit `3a0930f`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `3a0930f`

Commit

There are no files selected for viewing

0 comments on commit 3a0930f

0 comments on commit `3a0930f`