Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[UPD]fix coreml error; support swish and groupnorm op (Tencent#1738)
* [UPD] fix some about error status again * [UPD]enable const folder to infer blobs shape for coreml; fix reshape shape size logic; * [UPD]unify op system;check apple neral engine; * [UPD]unify op system;check apple neral engine; * [FIX] reset multi input in network forward for support image classifier demo * [FIX] fix multi input in network forward * [FIX] fix const op about weight shape(=1) * [FIX] fix const op about weight shape(=1) again * [UPD] update to support multi output forward * [UPD] update to support split op * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix coreml multi output case; add cache logic; * [UPD]fix multi output error * [FIX] fix pool op about pad * [UPD] update to support pad op (only allowed for H and W dimensions) * [UPD]remove blob manager of coreml network * [UPD]rename coreml_executor to coremlmodel * [UPD] remove InitCoreMLExecutor * [FIX] fix to support different input data type (float32 & int32) in forward * [UPD] update to support expand dims & reduce dims reshape by adding unsqueeze & squeeze * [UPD]change internal device from metal to arm for device npu * [FIX] fix conv op about group conv * [FIX] fix deconv op about group deconv * [UPD] update to support sub op * [UPD] update to support clip op * [UPD] update to support slice op * [UPD] update to support upsample op * [FIX] fix slice op about endindex * [UPD] update to support constant padding, allowed for C , H and W dimensions * [UPD]fix camera switch device * [UPD]fix actual device display error * [UPD]fix cache path * [UPD] upodate to add sub & slice & clip to project * [FIX] fix demo use NPU error * [UPD]fix ocr error * [FIX] fix upsample op about align_corners * [FIX] fix upsample op about Fractional scales * [BUG]fix coreml output nil error; fix upsample nn for fractional scale * [FIX] fix upsample op about scales order * [UPD] update to support slice v2 op * [UPD] update to support tanh v2 op * [FIX] fix batchnorm op about mean value * [FIX] fix some annotation * [BUG]fix upsample error; add shuffle channel coreml layer * [FIX] fix innerproduct op about inputchannels * [UPD] remove slicev2 to slice file * [UPD] remove tanhv2 to slice file * [UPD] update to reshape op about expand dims & reduce dims * [UPD] update to innerproduct op adout adding squeeze to reduce dims (in order to match old TNN model) * [UPD] update to support flatten to 2D op * [UPD] update to support relu6 op * [ADD]]add cast coreml layer * [ADD]]add shape coreml layer * [UPD] add flatten & relu6 & shuffle_channel to xcode project * [ADD]]add gather coreml layer * [ADD]]add gelu coreml layer * [ADD]]add layernorm coreml layer * [BUG]support int32 for coreml const layer * [BUG]support shape input for coreml reshape layer * [BUG]support model check for TNN_APPLE_NPU_ENABLE using MLComputeUnitsCPUOnly * [ADD]]add mat_mul coreml layer; * [UPD] update to support reshape layer when reshape_type = 1 * [UPD] update to coreml model input&output support int32 data tpye * [FIX] fix reshape layer about reshapedynamic input & output * [BUG]support mlmodel and mlmodelc for benchmark * [UPD] update to support conv layer with fp16 data type * [FIX] add 'APPLE_NPU' to model_check device_type_message * [FIX] fix some about conv layer with fp16 data type (TNN fp16 -> CoreML fp32) * [FIX] fix some about const layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support deconv layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support innerproduct layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support batchnorm layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support layernorm layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support prelu layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD] update to support matmul layer with fp16 data type (TNN fp16 -> CoreML fp32) * [FIX] fix some about matmul layer with fp16 data type (TNN fp16 -> CoreML fp32) * [UPD]support fuse form mul+add to batchnorm * [BUG]fix import error * [BUG]fix reshape error * [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4) * [BUG] fix reshape layer when reshape_type=1 (input_shape_size = output_shape_size = 4) * [UPD]support ssd * [ADD]]ssdlite-mobilenetv2 from tf * [UPD] update to support conv & deconv & const & innerproduct & batchnorm & layernorm & matmul & prelu layers with fp16 data type (TNN fp16 -> CoreML fp16) * [UPD] update to support batchnorm layers with fp16 data type (TNN fp16 -> CoreML fp16) * [FIX] set coreml layer default using full precision * [UPD] update to support hardsigmoid layer * [UPD] update to support hardswish layer * [UPD] update to support reducesum layer * [UPD] update to support reducemean layer * [UPD] add some coreml layer files to xcode project * [FIX] fix some annotation about hardswish * [BUG]fix reshape for tensor with dims size=0 * [UPD]support landscapeleft ui; clear navbar left items * [UPD]support landscapeleft ui; add stackview to support minor camera preview; * [ADD]add monodepth demo * [UPD] update to support unit_test * [FIX] upload missing download_model.sh and download_model.bat * [UPD] update concat & conv & shuffle uint_test files for APPLE_NPU * [FIX] rename unit_test model * [UPD] update to support softplus layer * [UPD] update to support softsign layer * [UPD] update to support div layer * [UPD] update binary layer unit_test for APPLE_NPU * [UPD] update to support reducemax layer * [UPD] update to support reducemin layer * [UPD]update project file * [UPD]add log error * [UPD] update hardswish layer unit_test for APPLE_NPU * [UPD]add log error * [UPD] update to skip stride_slice when APPLE_NPU * [BUG]fix batchnorm unitest * [BUG]fix prelu unitest * [BUG]fix prelu unitest * [BUG]fix prelu unitest * [BUG] fix unsqueeze unittest * [BUG] fix split unittest * [BUG] fix reshape unittest * [BUG]fix updample unitest * [BUG] fix reduce op (reducesum/reducemean/reducemax/reducemin) unittest * [BUG]fix layernorm unitest * [BUG] fix reduce op unittest again * [BUG] fix deconv unittest * [BUG] fix innerproduct unittest * [BUG]fix ssd demo display error * [BUG] fix matmul unittest * [BUG]fix benchmark error to support multiple model in the same directory * [BUG] add some explanation about reduce op unittest * [BUG]fix benchmark error to support multiple model in the same directory * [BUG] add some explanation about reduce op unittest again * [BUG]fix batchnorm param error * [BUG] fix reshape layer unittest * [BUG]fix batchnorm param error * [BUG]fix conv/deconv input/output channel error * [UPD] update to support stride_slice & unittest * [BUG] fix reshape layer unittest when reshape_type = 1 * [BUG] fix reshape layer unittest when reshape_type = 1 using reshapestatic * [BUG] fix reshape layer unittest using reshapestatic * [BUG] fix some annotation about reshape layer * [BUG] fix reshape layer output permute when reshape_type = 1 * [BUG] fix reshape layer using reshapestatic whem reshape_type = 1 * [BUG]fix broadcast layer error for input form constant map; fix bert demo error; * [BUG]fix blob convert error for int32 mat * [BUG]fix reshape name style * [UPD]add tiny bert fixed length 256 * [BUG] fix add layer by binary op base class * [BUG] fix div/mul/sub layer by binary op base class * [BUG]fix batchnorm unitest * [BUG]ensure clean up mlmodelc if error raises when compile * [UPD]adjust demo list * [BUG] fix conv layer about activation inplace * [BUG] fix conv layer about relu6 * [BUG] fix cleanup func none of return * [BUG] remove repetitive line * [BUG]fix batchnorm unitest * [BUG] fix conv layer about relu6 inplace * [UPD]automatically use apple npu * [UPD]add clean logic for coreml * [BUG] fix hardswish layer with 2 inputs * [UPD] update README.md & support.md about APPLE_NPU * [UPD]unify rawbuffer2coremlweight * [UPD]support coreml lstm * [UPD]fix lstm error * [UPD]support coreml lstm bidirection * [UPD]support coreml constofshape * [UPD]support slice at axis=0 * [UPD]ignore * [UPD]fix reshape error * [UPD]fix lstm error; replace suqeeze with reshape because some case suqeeze raise runtime compile error for axis = {3, 4} * [UPD]fix slice error * [UPD]support multiple mlmodel in the same dirctory; add autorelease memory, because coreml may need large memory in ocr demo * ignore * [UPD]add log msg * [UPD]fix reshape and slice error * [UPD]add auto release to model * [UPD]add auto release to model * [UPD]unify convertion from rawbuffer to coreml weight param * [FIX] fix matmul from rawbuffer to coreml weight param * [UPD]fix innerproduct input channel error * [BUG] fix matmul weight bug * remove some annotation * [BUG] fix matmul layer about fp16 * [FIX] fix sliceV2 op conflict with master * [FIX] fix sliceV2 op conflict with master * merge master (Tencent#1721) * Fix trt multistream logger (Tencent#1521) * [FIX] fix trt logger * [FIX] catch std::bad_alloc error for trt8 building * [FIX] return null while shape_tensor size -1 * Update version.h Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update split_utils.cc (Tencent#1528) 我使用mingw32编译提示错误,因为使用mingw32编译器仍然需要空间命名 [ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)': D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope int len = min((i - cursor), subs_length - 1); 个人认为修改这样更好一下,可以适应mingw32和兼顾之前的编译器 Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update README.md (Tencent#1538) Typos * [UPD]update QQ group (Tencent#1552) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [opencl][fix] try save program cache (Tencent#1557) * Dev roi align (Tencent#1511) * [ARM] fix int32 blob cvt to mat * [ARM] support roi align * [ARM] add roi align unit test * [ARM] add to xcodeproj Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Fix arm gather and constant blob (Tencent#1564) * [ARM][BUG] fix gather error for indice < 0 * [ARM][BUG] fix buffer to blob error without converting precision * [ARM] update type convert in layer_norm fp16 Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> * Dev add config layer (Tencent#1569) * add config layer param to set arm conv algorithm for specific layer Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> * 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571) * [ONNX][BUG]1. fix compile bug; * [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题; * [ADD][TOOLS] add dynamic range quantization (Tencent#1572) * [ADD][TOOLS] support fake quantization * [UPD][FAKE_QUANT] fix bug * [UPD][DOC] add fake quantization in doc * [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer * [UPD] remove redundant comment * [UPD] update comment for DynamicRangeDequant * [DRQuant][UPD] fix namespace issue * [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585) Co-authored-by: ealinli <ealinli@tencent.com> * [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Bugfix from train branch (Tencent#1592) * [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc. * [BUG] fix Convert from NCHW to NHWC error when input is on arm device. * [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device. * [BUG] fix tflite_converter bug when transform a activation layer. * add nchw format condition when copy int32 mat to blob * rollback changes on tflite_op_converter.cc Co-authored-by: sanerzheng <sanerzheng@tencent.com> * [UPD][OPENCL] opencl support x86 mat (Tencent#1593) Co-authored-by: ealinli <ealinli@tencent.com> * [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596) * [UPD][OPENCL] add ocl version check (Tencent#1601) * [UPD][OPENCL] add ocl version check * [UPD][OPENCL] update message for vervion check Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604) Co-authored-by: ealinli <ealinli@tencent.com> * [DOC][UPD] modify image links in doc (Tencent#1617) Co-authored-by: ealinli <ealinli@tencent.com> * remove redundant test cases (Tencent#1614) * Fix typos. (Tencent#1626) * Fix typos. * Update Readme. Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636) * [UPD][OPENCL] get opencl version when GpuType is OTHER * [UPD][OPENCL] optimize nv gpu judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * Patch x86 avx support (Tencent#1633) * merge dev_vc14_m1_debug, support x86 avx * add option to support x86 avx2 compile * update win_x86_opencl building script Co-authored-by: Dandiding <Dandiding@tencent.com> * fix x86 avx2 options (Tencent#1638) * fix typos in doc (Tencent#1634) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [X86][BUG] fix deconv layer build error (Tencent#1641) * [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646) * [OPENCL][UPD] fix deconv and avgpool when read image * [OPENCL][UPD] add header file for pooling Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] opencl support cache on windows (Tencent#1645) * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic * [OPENCL][UPD] support cache on windows * [OPENCL][UPD] fix load cache on windows Co-authored-by: ealinli <ealinli@tencent.com> * [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647) * [DRQ][UPD] dynamic range quant model support do const folder * [TOOLS][UPD] dynamic range quant updates usage Co-authored-by: ealinli <ealinli@tencent.com> * 1. make model_check support dynamic range quantized model; (Tencent#1653) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial * [TUTORIAL][UPD] update code link * [TUTORIAL][UPD] fix typo Co-authored-by: ealinli <ealinli@tencent.com> * [X86][FIX] binary op support fp16 weights (Tencent#1655) * [X86][FIX] binary op support fp16 weights * [X86][FIX] matmul support fp16 weights Co-authored-by: ealinli <ealinli@tencent.com> * Feature dynamic quant fc (Tencent#1660) * [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer; * [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差; * [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663) * [FIX] Fix CPU Not Operator data type error. * [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug * fix _mm256_load_ps segmentation fault (Tencent#1682) * fix _mm256_load_ps segmentation fault * fix crash on mm256_load when innerproduct * use loadu instead of stride-judgement * remove unused code Co-authored-by: fishdai <fishdai@tencent.com> * x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684) * Dev x86 layer adapter (Tencent#1683) * [X86] add layer acc adapter * [X86] NULL to nullptr * [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder * [X86][OPENVINO] fix hard code of ov precision Co-authored-by: anonymous <anonymous@mail.org> * [ARM] fix arm cross compile error caused by float-abi (Tencent#1678) * avoid nullptr in IsSupport (Tencent#1685) * [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686) Co-authored-by: ealinli <ealinli@tencent.com> * Dev metal ngray (Tencent#1693) * [METAL] metal support ngray input mat * [METAL]fix bytes_size * [COREML] fix dynamic quantization model about coreml Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698) * [UPD][DRQ] support quantizing matmul's const weight * [UPD][DRQ] add scale check in constant map Co-authored-by: ealinli <ealinli@tencent.com> * [FIX] fix compile macos framework (Tencent#1687) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * Optimize dynamic range quantize (Tencent#1699) * [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑; * [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model; * [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码; * [DRQ][UPD]1.fix conflict with merge master code; Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * Fix windows x86 build (Tencent#1697) * [FIX] remove nanodet for windows * remove ninga compile for some bug * fix x86 mat type register macro name * fix x86 matmul with 2 inputs Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [METAL] fix stride slice crach when dims is 2 (Tencent#1701) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash). 3. Use ios project build/profile M1-Mac. (Tencent#1700) Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [iOS][UPD]1. add missing file for xcode project; (Tencent#1705) * [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]update merge logic for swish groupnorm deconv (Tencent#1708) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish * [UPD]support fusion for deconv+add and deconv+add+bn * [UPD]add aliyun disk link for tnn models * [UPD]support fusion for group norm * [UPD]support fusion for swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [DRQ][BUG]1. fix bug for max_values; (Tencent#1716) * Hotfix m1 build (Tencent#1715) * fix apple m1 clang 13.1 compile error * fix unit test compile error Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> * [FIX] fix sliceV2 op conflict with master again * [METAL][OP][FIX] 1.metal support groupnorm & swish op 2.fix metal blob conveter & reformat bug when input dim is 1 * reset model * [COREML] coreml support swish op * [COREML] fix coreml batchnorn bug * [COREML]coreml support groupmorm * [COREML]coreml support instancenorm * reset model * solve conflict * solve conflict * Dev groupnorm (Tencent#1726) * Fix trt multistream logger (Tencent#1521) * [FIX] fix trt logger * [FIX] catch std::bad_alloc error for trt8 building * [FIX] return null while shape_tensor size -1 * Update version.h Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update split_utils.cc (Tencent#1528) 我使用mingw32编译提示错误,因为使用mingw32编译器仍然需要空间命名 [ 99%] Building CXX object CMakeFiles/TNN.dir/source/tnn/utils/split_utils.cc.obj D:\TNN\source\tnn\utils\split_utils.cc: In static member function 'static tnn::Status tnn::SplitUtils::SplitStr(const char*, tnn::str_arr&, const char*, bool, bool, bool, bool, bool)': D:\TNN\source\tnn\utils\split_utils.cc:163:23: error: 'min' was not declared in this scope int len = min((i - cursor), subs_length - 1); 个人认为修改这样更好一下,可以适应mingw32和兼顾之前的编译器 Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Update README.md (Tencent#1538) Typos * [UPD]update QQ group (Tencent#1552) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [opencl][fix] try save program cache (Tencent#1557) * Dev roi align (Tencent#1511) * [ARM] fix int32 blob cvt to mat * [ARM] support roi align * [ARM] add roi align unit test * [ARM] add to xcodeproj Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Fix arm gather and constant blob (Tencent#1564) * [ARM][BUG] fix gather error for indice < 0 * [ARM][BUG] fix buffer to blob error without converting precision * [ARM] update type convert in layer_norm fp16 Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> * Dev add config layer (Tencent#1569) * add config layer param to set arm conv algorithm for specific layer Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> * 修复 protobuf 版本升级造成的 onnx2tnn 编译失败的问题 (Tencent#1571) * [ONNX][BUG]1. fix compile bug; * [ONNX2TNN][BUG]1. 修复因为 protobuf 版本升级带来的编译问题; * [ADD][TOOLS] add dynamic range quantization (Tencent#1572) * [ADD][TOOLS] support fake quantization * [UPD][FAKE_QUANT] fix bug * [UPD][DOC] add fake quantization in doc * [UPD] 1.rename fake quant to dynamic range quant 2.move dequant to net_optimizer * [UPD] remove redundant comment * [UPD] update comment for DynamicRangeDequant * [DRQuant][UPD] fix namespace issue * [DRQuant][UPD] Turn off TNN_SYMBOL_HIDE to fix ci Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] opencl support using unoptimized conv (Tencent#1581) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][CONVERTER] lstm support sequence_lens (Tencent#1585) Co-authored-by: ealinli <ealinli@tencent.com> * [MODEL_CHECK][BUG]1. fix bug for dump layer(fp16); (Tencent#1567) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Bugfix from train branch (Tencent#1592) * [BUG] fix get dims value bug when input is 1D or 2D in arm_reduce_layer_acc.cc. * [BUG] fix Convert from NCHW to NHWC error when input is on arm device. * [BUG] fix convert mat to blob bug when input is NC_INT32 on arm device. * [BUG] fix tflite_converter bug when transform a activation layer. * add nchw format condition when copy int32 mat to blob * rollback changes on tflite_op_converter.cc Co-authored-by: sanerzheng <sanerzheng@tencent.com> * [UPD][OPENCL] opencl support x86 mat (Tencent#1593) Co-authored-by: ealinli <ealinli@tencent.com> * [CONVERTER][BUG]1. fix issue 1595; (Tencent#1596) * [UPD][OPENCL] add ocl version check (Tencent#1601) * [UPD][OPENCL] add ocl version check * [UPD][OPENCL] update message for vervion check Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][OPENCL] solve the problem that matmul, tile have incorrect results on helio p65 (Tencent#1602) Co-authored-by: ealinli <ealinli@tencent.com> * [UPD][DYQ] fix dynamic range quant compile error on windows (Tencent#1604) Co-authored-by: ealinli <ealinli@tencent.com> * [DOC][UPD] modify image links in doc (Tencent#1617) Co-authored-by: ealinli <ealinli@tencent.com> * remove redundant test cases (Tencent#1614) * Fix typos. (Tencent#1626) * Fix typos. * Update Readme. Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * Interpreter change from std::map to safe_map, later one offers a const operator[] function (Tencent#1618) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD][OPENCL] get opencl version when GpuType is OTHER (Tencent#1636) * [UPD][OPENCL] get opencl version when GpuType is OTHER * [UPD][OPENCL] optimize nv gpu judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * Patch x86 avx support (Tencent#1633) * merge dev_vc14_m1_debug, support x86 avx * add option to support x86 avx2 compile * update win_x86_opencl building script Co-authored-by: Dandiding <Dandiding@tencent.com> * fix x86 avx2 options (Tencent#1638) * fix typos in doc (Tencent#1634) Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> * [X86][BUG] fix deconv layer build error (Tencent#1641) * [OPENCL][FIX] fix conv and dwconv on some of the AMD GPUs * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] fix deconv, avgpool on AMD GPU (Tencent#1646) * [OPENCL][UPD] fix deconv and avgpool when read image * [OPENCL][UPD] add header file for pooling Co-authored-by: ealinli <ealinli@tencent.com> * [OPENCL][UPD] opencl support cache on windows (Tencent#1645) * [UPD][OPENCL] add coor check for conv and dwconv * [OPENCL][FIX] fix compilation issues * [OPENCL][UPD] optimize AMD GPU judgment logic * [OPENCL][UPD] support cache on windows * [OPENCL][UPD] fix load cache on windows Co-authored-by: ealinli <ealinli@tencent.com> * [DRQ][UPD] dynamic range quant model support do const folder (Tencent#1647) * [DRQ][UPD] dynamic range quant model support do const folder * [TOOLS][UPD] dynamic range quant updates usage Co-authored-by: ealinli <ealinli@tencent.com> * 1. make model_check support dynamic range quantized model; (Tencent#1653) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial (Tencent#1640) * [ADD][TUTORIAL] add mbv2-ssd conversion and deployment tutorial * [TUTORIAL][UPD] update code link * [TUTORIAL][UPD] fix typo Co-authored-by: ealinli <ealinli@tencent.com> * [X86][FIX] binary op support fp16 weights (Tencent#1655) * [X86][FIX] binary op support fp16 weights * [X86][FIX] matmul support fp16 weights Co-authored-by: ealinli <ealinli@tencent.com> * Feature dynamic quant fc (Tencent#1660) * [DYNAMIC_QUANT][UPD]1. dynamic quant support inner_product layer; * [ARM][UPD]1. arm gemm 部分情况下使用 Kahan sum 算法,以避免 fp16 累加误差; * [FIX][CPU][TRT] Fix CPU Not OP bug, Fix TensorRT ShapeTensor Class Bug. (Tencent#1663) * [FIX] Fix CPU Not Operator data type error. * [FIX] Fix TensorRT ShapeTensor class ConvertTo1D() func bug * fix _mm256_load_ps segmentation fault (Tencent#1682) * fix _mm256_load_ps segmentation fault * fix crash on mm256_load when innerproduct * use loadu instead of stride-judgement * remove unused code Co-authored-by: fishdai <fishdai@tencent.com> * x86_acc & blob_converter now will consider the BlobHandle.bytes_offset (Tencent#1684) * Dev x86 layer adapter (Tencent#1683) * [X86] add layer acc adapter * [X86] NULL to nullptr * [X86][OPENVINO] add openvino adapter layer builder, fallback to cpu naive impl if there is no normal ov layer builder * [X86][OPENVINO] fix hard code of ov precision Co-authored-by: anonymous <anonymous@mail.org> * [ARM] fix arm cross compile error caused by float-abi (Tencent#1678) * avoid nullptr in IsSupport (Tencent#1685) * [UPD][TOOLS] 1.increase subs_length 2.align model support bool and int32 input 3. fix gather and onehot convert 4. gather_nd support indices_shape[-1] < r (Tencent#1686) Co-authored-by: ealinli <ealinli@tencent.com> * Dev metal ngray (Tencent#1693) * [METAL] metal support ngray input mat * [METAL]fix bytes_size * [COREML] fix dynamic quantization model about coreml Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [UPD][DRQ] support quantizing matmul's const weight (Tencent#1698) * [UPD][DRQ] support quantizing matmul's const weight * [UPD][DRQ] add scale check in constant map Co-authored-by: ealinli <ealinli@tencent.com> * [FIX] fix compile macos framework (Tencent#1687) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * Optimize dynamic range quantize (Tencent#1699) * [DynamicRangeQuantize][UPD]1. 添加了根据权重分布判断是否量化的逻辑; * [DynamicQuantization][UPD]1. dynamic_range_quantization support TNN fp16 model; * [DRQ][UPD]1. 修复了 model_check_android.sh 脚本中指定 reference file,但是推理没有用到的 bug;2. 优化了 dynamic_range_quantization 中的部分代码; * [DRQ][UPD]1.fix conflict with merge master code; Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * Fix windows x86 build (Tencent#1697) * [FIX] remove nanodet for windows * remove ninga compile for some bug * fix x86 mat type register macro name * fix x86 matmul with 2 inputs Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [METAL] fix stride slice crach when dims is 2 (Tencent#1701) Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> * [mac] 1. FIX X86 and ARM conflict; 2. ADD ARM arch on intel cpu (You can use ARM if rosetta-X86 crash). 3. Use ios project build/profile M1-Mac. (Tencent#1700) Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [iOS][UPD]1. add missing file for xcode project; (Tencent#1705) * [BUG]fix coreml error of slicev2、padv2 and matmul; (Tencent#1703) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [UPD]update merge logic for swish groupnorm deconv (Tencent#1708) * [BUG]fix YouTu face alignment model * [UPD]update mean pts file logic * [UPD]draw face points green * [UPD]unify example controller list * [UPD]unify example controller list * [UPD]move blaze anchor file to resource * [METAL]update tnn project * [UPD]update tool onnx2coreml * [ADD]support ShareCommandQueue between instances * [ADD]support ShareCommandQueue between instances * [UPD]add log message * [UPD]transfer file half.hpp * [UPD]fix xcode compile error with fp16 * [UPD]fix xcode compile error with fp16 * [UPD]update model type erro msg * [FIX]fix logic error of constofshape * [UPD]update debug message * [FIX]fsupport int32 for neg op * [BUG]fix init error with nil commadbuffer * [UPD]add mac build xcode project; fix ios mac build script; * [UPD]add mac build xcode project; fix ios mac build script; * [ADD]add QQ group 2 of TNN * [BUG]fix dynamic dequant error; fix arm pad error; * [BUG]support coreml padv2 * [BUG]fix ccoreml matmul error when it has const input blob * [BUG]fix coreml slicev2 * [UPD]add convert logic of swish * [BUG]fix error cpu error for x86 mac * [UPD]support fusion for gemm + bn * [UPD]add convert logic of swish * [UPD]support fusion for deconv+add and deconv+add+bn * [UPD]add aliyun disk link for tnn models * [UPD]support fusion for group norm * [UPD]support fusion for swish Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> * [DRQ][BUG]1. fix bug for max_values; (Tencent#1716) * Hotfix m1 build (Tencent#1715) * fix apple m1 clang 13.1 compile error * fix unit test compile error Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> * [ARM] support groupnorm * [ARM] support swish * add swish to conv-post-fuse * [ADD][OPENCL] opencl add group norm and swish (Tencent#1722) Co-authored-by: ealinli <ealinli@tencent.com> * add x86 swish and groupnorm operator; explicitly open see4.2 with low version of compiler Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: darrenyao87 <62542779+darrenyao87@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com> * fix coreml groupnorm unit test * [ADD]add exp op * [BUG]fix deconv bisas error * [UPD]init cpu memory with 0 for bert model * [BUG]fix reshape static error; reshape static layer cannot handle 0 or -1 * [UPD]support inst norm for coreml; update tnn project file; * [BUG]fix error for layer without layer resource, [] operater will add one, which is not thread safe * [UPD]add param to batchnorm to support instancenorm * [UPD]adjust groupnorm with batchnorm * [UPD]support instancenorm with groupnorm by setting group==channels * [UPD]update unit test of instancenorm * [BUG]fix unit test error for layer batchnorm * [UPD]update tnn project * [BUG]fix unit test error for APPLE NPU * [BUG]fix unit test crash for layer batchnorm * [UPD]ignore cpu or gpu benchmark for mlmodel or mlmodelc * [UPD]ignore * [UPD]ignore pixelshuffle for apple npu * [UPD]ignore matconvert for apple npu * [UPD]ignore some unary op for apple npu * [UPD]unify before and after coreml layer, simplify lstm layer * [UPD]fix lstm error for ht and ct for biLSTM * [UPD]fix const input load error * [UPD]fix internal error * [UPD]ignore Co-authored-by: jacinhu <jacinhu@tencent.com> Co-authored-by: teslawho <597645882@qq.com> Co-authored-by: teslawho <71381575+teslawho@users.noreply.github.com> Co-authored-by: shenpenwang <41420892+Maosquerade@users.noreply.github.com> Co-authored-by: neiltian <65950677+neiltian-tencent@users.noreply.github.com> Co-authored-by: sxj731533730 <sxj731533730@gmail.com> Co-authored-by: Yulv-git <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: quinnrong94 <67782915+quinnrong94@users.noreply.github.com> Co-authored-by: lucasktian <lucasktian@tencent.com> Co-authored-by: powerpwang <72859430+powerpwang@users.noreply.github.com> Co-authored-by: ealinli <37806708+1627180283@users.noreply.github.com> Co-authored-by: powerpwang <powerpwang@outlook.com> Co-authored-by: ealinli <ealinli@tencent.com> Co-authored-by: Dandi Ding <bluaxe@users.noreply.github.com> Co-authored-by: saner zheng <zqawszqaws@126.com> Co-authored-by: sanerzheng <sanerzheng@tencent.com> Co-authored-by: Feng Shijie <j514681085@icloud.com> Co-authored-by: Dandiding <Dandiding@tencent.com> Co-authored-by: FeiGeChuanShu <774074168@qq.com> Co-authored-by: seanxcwang <66675860+seanxcwang@users.noreply.github.com> Co-authored-by: doxutx <92915535+doxutx@users.noreply.github.com> Co-authored-by: kumbayaco <xyu.dai@gmail.com> Co-authored-by: fishdai <fishdai@tencent.com> Co-authored-by: anonymous <anonymous@mail.org> Co-authored-by: XDC <196890111@qq.com> Co-authored-by: gennyxu <gennyxu@tencent.com> Co-authored-by: quinnrong <quinnrong@quinnrongs-MacBook-Pro.local> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: shenpenwang <565067453@qq.com>
- Loading branch information