RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) #4118

thelastlin · 2022-08-05T16:51:55Z

Tuple types in segment load/store operation have been removed from latest compilers.

This PR uses new interface for segment load/store, and add some wrapper for old compilers which only support tuple types.

This PR also change word_type to size_t, then add clang ci (riscv64-unknown-linux-gnu).

(Issue #4100)

----- search: vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1$([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)$, vl\); substitute by: vsseg$1e$2_v_$3$2m$4($5, $6, vl);

--- search: vssseg([1-9])e(8|16|32)_v_f\2m1x\1$([ -~]+), vcreate_f\2m1x\1\(([ -~]+)$, vl\); substitute by: vssseg$1e$2_v_f$2m1($3, $4, vl);

codecov-commenter · 2022-08-05T16:58:36Z

Codecov Report

Merging #4118 (6612a44) into master (00c08d7) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #4118      +/-   ##
==========================================
+ Coverage   94.43%   94.44%   +0.01%     
==========================================
  Files         748      750       +2     
  Lines      179005   179375     +370     
==========================================
+ Hits       169047   169417     +370     
  Misses       9958     9958

Impacted Files	Coverage Δ
src/layer/riscv/padding_packn.h	`100.00% <ø> (ø)`
src/layer/riscv/riscv_activation.h	`100.00% <ø> (ø)`
src/layer/riscv/rvv_mathfun.h	`100.00% <ø> (ø)`
src/layer/riscv/rvv_mathfun_fp16s.h	`100.00% <ø> (ø)`
src/layer/riscv/absval_riscv.cpp	`100.00% <100.00%> (ø)`
src/layer/riscv/binaryop_riscv.cpp	`100.00% <100.00%> (ø)`
src/layer/riscv/cast_riscv.cpp	`95.58% <100.00%> (ø)`
src/layer/riscv/clip_riscv.cpp	`100.00% <100.00%> (ø)`
src/layer/riscv/concat_riscv.cpp	`95.58% <100.00%> (ø)`
src/layer/riscv/convolution1d_riscv.cpp	`99.00% <100.00%> (ø)`
... and 78 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

thelastlin · 2022-08-06T03:35:04Z

Failed to pass tests when build with riscv-gcc/riscv-gcc-rvv-next(edffbea), binutils-2.39 :

test_convolution
test_convolution1d
test_convolutiondepthwise
test_deconvolution
test_deconvolutiondepthwise
test_squeezenet

thelastlin · 2022-08-06T08:53:26Z

Failed to pass tests when build with riscv-gcc/riscv-gcc-rvv-next(edffbea), binutils-2.39 :

Succeed with clang-14 with following requirement to build:

Append some type definitions in riscv-vector.h;
clang++ complains VLAs; [83d7d50]
Need binutils-2.39 and other GNU toolchains (--gnu-toolchain=); [113052b]
--ld-path= to ld in RISC-V GNU toolchains may required. [113052b]

thelastlin · 2022-08-11T11:16:30Z

Failed to pass tests when build with riscv-gcc/riscv-gcc-rvv-next(edffbea), binutils-2.39

Use riscv-gcc/riscv-gcc-rvv-next(32c7d7c), binutils-2.39 instead.

zhongjuzhe · 2022-08-24T01:20:06Z

Hi, would you mind telling me whether there is still some bugs when running your algorithm using the latest riscv-gcc-rvv-next.

If you encounter any bugs or performance issues,
feel free to file a issue here: https://github.com/riscv-collab/riscv-gcc/issues

Recently, I am working on push RVV codes to GCC upstream. Your feedbacks are important.

nihui · 2022-10-01T13:24:43Z

Thanks for your contribution !

* remove duplicated newline (Tencent#4187) * remove duplicated newline (Tencent#4188) * optmize softmax arm neon (Tencent#4171) * [docs] Fix typo (Tencent#4201) * [Prelu x86] Finish intrinsic with elempack merged (Tencent#4177) * changed size of images for pretty formatting of page (Tencent#4193) * [Gelu x86] Finish intrinsic with elempack merged(fast version) (Tencent#4144) * Finish the gelu x86 intrinsics * Finish the fast tanh x86 simd impl * Ignore .xmake directory (Tencent#4212) * Bump pypa/cibuildwheel from 2.9.0 to 2.10.1 (Tencent#4207) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.9.0 to 2.10.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.9.0...v2.10.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * style: space alignment (Tencent#4217) * Ignore CMakeSettings.json, the Visual Studio CMake schema file (Tencent#4228) * RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part Tencent#4100) (Tencent#4118) * RVV: use size_t for vl * RVV: replace vsseg.v tuple type by using regex ----- search: vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1$([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)$, vl\); substitute by: vsseg$1e$2_v_$3$2m$4($5, $6, vl); * RVV: replace vssseg.v tuple types by using regex --- search: vssseg([1-9])e(8|16|32)_v_f\2m1x\1$([ -~]+), vcreate_f\2m1x\1\(([ -~]+)$, vl\); substitute by: vssseg$1e$2_v_f$2m1($3, $4, vl); * RVV: replace vlseg.v tuple types in load/store * RVV: replace vloxseg2ei32.v tuple types * RVV: add a wrapper for old compilers * RVV: add segment load/store wrapper in pakcing * RVV: fix cmake test * RVV: make clang happy by dropping VLAs in sgemm * RVV: add clang cmake toolchain configure * RVV: add clang ci, riscv64-unknown-linux-gnu Co-authored-by: thelastlin <thelastlin@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> * Bump pypa/cibuildwheel from 2.10.1 to 2.10.2 (Tencent#4220) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.1 to 2.10.2. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.10.1...v2.10.2) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add c906 build ci (Tencent#4232) * Add benchmark result of T-Head TH1520 (Tencent#4240) `cpuinfo`: ``` isa : rv64imafdcvsu mmu : sv39 cpu-freq : 1.848Ghz cpu-icache : 64KB cpu-dcache : 64KB cpu-l2cache : 1MB cpu-tlb : 1024 4-ways cpu-cacheline : 64Bytes cpu-vector : 0.7.1 ``` Compiled with `-DCMAKE_TOOLCHAIN_FILE=../toolchains/c910-v240.toolchain.cmake -DCMAKE_BUILD_TYPE=release -DNCNN_OPENMP=OFF -DNCNN_THREADS=OFF -DNCNN_RUNTIME_CPU=OFF -DNCNN_RVV=ON -DNCNN_SIMPLEOCV=ON -DNCNN_BUILD_EXAMPLES=ON` Seems much worse than expected 🤔 * fix param parsing issue when layer/blob name exceeds 255 (Tencent#4236) * fix param parsing issue when layer/blob name exceeds 255 * apply code-format changes Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com> * Memory Pool Improvement For Variadic Sized Inputs (Tencent#4190) * Simple miss count for better space efficiency * Simple double ended greedy; * Add size drop threshold setter; * set workspace allocator cr to zero as we had some sort of recylcing capability :P Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> * docs: disable fp16 when wrong results encountered caused by overflow (Tencent#4248) * pnnx math operation (Tencent#4251) * more stricter armv7 fp16 and armv84 bf16 compiler check, fix Tencent#4147 fix Tencent#4222 (Tencent#4247) * modified the param axes of expanddims in modelwriter (Tencent#4259) * Add TH1520 (4*C910V) toolchain support. (Tencent#4267) * implement lstm proj_size (Tencent#4263) * Optimize x86 DeformableConv2D (Tencent#4128) * fix compile warning with gcc 9.1.0 including simplestl.h file (Tencent#4274) * fix compile warning with gcc 9.1.0 including simplestl.h file * apply code-format changes Co-authored-by: veahow <veahow@users.noreply.github.com> * add benchmark for rk3588 on rock5b (Tencent#4275) * linux-x64-cpu-gcc on tencent ci * implement layer feature disabled bit (Tencent#4278) * add elu vulkan operator (Tencent#4280) * fix tencent ci (Tencent#4277) * implement GLU and pnnx conversion (Tencent#4283) * Bump pypa/cibuildwheel from 2.10.2 to 2.11.1 (Tencent#4271) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.2 to 2.11.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.10.2...v2.11.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix pnnx softmax/normalize/slice negative axis conversion to ncnn (Tencent#4284) * pnnx glu batchindex aware conversion (Tencent#4285) * 1. Fix typo in readme (Tencent#4287) * x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family (Tencent#4286) * pnnx skip dynamic size evaluation (Tencent#4291) * Fix linux build error(Tencent#4265) (Tencent#4294) Co-authored-by: wangyu <786794414@qq.com> * general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 (Tencent#4300) * x86 unified fc fp32/fp16s (Tencent#4303) * more fma * more transpose utility function * Bump pypa/cibuildwheel from 2.11.1 to 2.11.2 (Tencent#4308) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.11.1 to 2.11.2. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.11.1...v2.11.2) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * pnnx pytorch 1.13 (Tencent#4314) * fix Tencent#4315 (Tencent#4316) * get_physical_cpu_count api family (Tencent#4302) * get_physical_cpu_count api family * set default to physical big cpu * always treat smt core as big core * is_smt_cpu * get max freq mhz on windows * windows thread affinity * groupnorm 1d/2d/4d (Tencent#4312) * fix slice end index, fix fp16 model weight alignment (Tencent#4317) * tencent ci test-coverage pnnx (Tencent#4305) * RVV: BatchNorm with fp16s(a) support (Tencent#4075) * RVV: InstanceNorm with fp16s(a) support (Tencent#4078) * fix ci pnnx build * fold new_full and full_like (Tencent#4323) * pnnx convert nn.Softmax2d (Tencent#4324) * pnnx convert fold unfold (Tencent#4325) * support yolov5 6.2 (Tencent#4328) * implement ncnn fold and unfold (Tencent#4326) * pnnx load gpu torchscript and reset device (Tencent#4330) * fix:pnnx-softmax (Tencent#4333) * pnnx save onnx zero (Tencent#4077) * save foldable constants in file for reducing memory usage (Tencent#4337) * match inplace slice copy pattern, rewrite copy uses (Tencent#4338) * add vector optimization for loongarch64 (Tencent#4242) * ci loongarch64 lsx (Tencent#4344) * gridsample op support (Tencent#4288) Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> * squeeze and expanddims 4d (Tencent#4346) * implement MultiheadAttention kdim vdim (Tencent#4347) * pnnx convert torch bitwise left_shift right_shift (Tencent#4349) * pnnx fp16 option for ncnn and onnx weight type (Tencent#4350) * pnnx fuse more function to module (Tencent#4351) * pnnx fuse more function to module * rename some pass name * fuse adjacent reshape, fuse pad conv2d * fuse pad conv1d * split tests (Tencent#4354) * Support mat.numpy() in Python (Tencent#4356) * Fix typo in stb_image.h (Tencent#4358) exitting -> exiting * Fix windows-arm64 build for non-neon case (Tencent#4227) * update release ci (Tencent#4359) * update release ci * find modern glslang * parallel jobs on windows * Fix c api allocator (Tencent#4360) * add some c_api interfaces related to allocator setup. * fix errors in allocator parameters in c_api. * test c api allocator Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com> * update glslang (Tencent#4361) * disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk (Tencent#4362) * I added one more project to the list of examples. (Tencent#4205) * Dedicated to coloring black and white photographs. * add example project link (Tencent#4365) * fix(pybind11): build error (Tencent#4368) * fix openmp affinity abort when cpu goes offline (Tencent#4370) * Update release-python.yml * small fixes * unpack list input * Remove LSTM2 * fix LSTM Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Menci <huanghaorui301@gmail.com> Co-authored-by: luqiang guo <702572275@qq.com> Co-authored-by: Lry89757 <77330637+LRY89757@users.noreply.github.com> Co-authored-by: magicse <magicse@users.noreply.github.com> Co-authored-by: Zhuo Zhang <imzhuo@foxmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 汤圆奶昔 <47135403+tonori@users.noreply.github.com> Co-authored-by: Xavier Hsinyuan <me@lstlx.com> Co-authored-by: thelastlin <thelastlin@users.noreply.github.com> Co-authored-by: nihui <shuizhuyuanluo@126.com> Co-authored-by: 柚木鉉 <740291272@qq.com> Co-authored-by: Zhang Ge <sjtu.zg123@gmail.com> Co-authored-by: ZhangGe6 <ZhangGe6@users.noreply.github.com> Co-authored-by: LinHe <LinHe.Lurking@gmail.com> Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com> Co-authored-by: nihuini <nihuini@tencent.com> Co-authored-by: MisakaBit <MisakaBit@gmail.com> Co-authored-by: LiuYi-Up <73060646+LiuYi-Up@users.noreply.github.com> Co-authored-by: 陸言 <robinluaa@outlook.com> Co-authored-by: miemie2013 <53960695+miemie2013@users.noreply.github.com> Co-authored-by: Eahow Chen <15228088+veahow@users.noreply.github.com> Co-authored-by: veahow <veahow@users.noreply.github.com> Co-authored-by: li mengyang <hwdefcom@outlook.com> Co-authored-by: Yoh <wpz_yoh@163.com> Co-authored-by: Caize Wu <zepanwucai@gmail.com> Co-authored-by: bestpower <wangyu117136@gmail.com> Co-authored-by: wangyu <786794414@qq.com> Co-authored-by: shaoshengsong <30892500+shaoshengsong@users.noreply.github.com> Co-authored-by: WuJinxuan <2456510228@qq.com> Co-authored-by: junchao-loongson <68935141+junchao-loongson@users.noreply.github.com> Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com> Co-authored-by: Ikko Ashimine <eltociear@gmail.com> Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com> Co-authored-by: tpoisonooo <khj.application@aliyun.com>

thelastlin and others added 7 commits August 5, 2022 11:01

RVV: use size_t for vl

02c3378

RVV: replace vsseg.v tuple type by using regex

117a169

----- search: vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1$([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)$, vl\); substitute by: vsseg$1e$2_v_$3$2m$4($5, $6, vl);

RVV: replace vssseg.v tuple types by using regex

a49b775

--- search: vssseg([1-9])e(8|16|32)_v_f\2m1x\1$([ -~]+), vcreate_f\2m1x\1\(([ -~]+)$, vl\); substitute by: vssseg$1e$2_v_f$2m1($3, $4, vl);

RVV: replace vlseg.v tuple types in load/store

472a26e

RVV: replace vloxseg2ei32.v tuple types

8b0c721

RVV: add a wrapper for old compilers

aac89b7

apply code-format changes

9022042

RVV: add segment load/store wrapper in pakcing

d9fda93

thelastlin changed the title ~~RVV: use new interface for segment load/store; change word_type to size_t~~ [WIP]RVV: use new interface for segment load/store; change word_type to size_t Aug 6, 2022

RVV: fix cmake test

b09d8e9

thelastlin changed the title ~~[WIP]RVV: use new interface for segment load/store; change word_type to size_t~~ RVV: use new interface for segment load/store; change word_type to size_t Aug 6, 2022

thelastlin and others added 3 commits August 6, 2022 21:39

RVV: make clang happy by dropping VLAs in sgemm

83d7d50

RVV: add clang cmake toolchain configure

113052b

apply code-format changes

e3af501

thelastlin changed the title ~~RVV: use new interface for segment load/store; change word_type to size_t~~ RVV: use new interface for segment load/store; change word_type to size_t; (#4100) Aug 7, 2022

RVV: add clang ci, riscv64-unknown-linux-gnu

7903d78

thelastlin changed the title ~~RVV: use new interface for segment load/store; change word_type to size_t; (#4100)~~ RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) Aug 7, 2022

nihui added 4 commits October 1, 2022 15:24

update rvv toolchain

81de5cb

update clang toolchain

527c6f3

fix build

5ac75bf

drop global fp16 definition

6612a44

nihui merged commit e7eadca into Tencent:master Oct 1, 2022

thelastlin added a commit to thelastlin/ncnn that referenced this pull request Oct 1, 2022

RVV: replace word_type to size_t (Tencent#4100, Tencent#4118)

ef8afa5

thelastlin added a commit to thelastlin/ncnn that referenced this pull request Oct 1, 2022

RVV: replace word_type to size_t (Tencent#4100, Tencent#4118)

6948e83

thelastlin added a commit to thelastlin/ncnn that referenced this pull request Oct 1, 2022

RVV: replace word_type to size_t (Tencent#4100, Tencent#4118)

044dc34

thelastlin added a commit to thelastlin/ncnn that referenced this pull request Oct 1, 2022

RVV: replace word_type to size_t (Tencent#4100, Tencent#4118)

d80731a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) #4118

RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) #4118

thelastlin commented Aug 5, 2022 •

edited

Loading

codecov-commenter commented Aug 5, 2022 •

edited

Loading

thelastlin commented Aug 6, 2022

thelastlin commented Aug 6, 2022 •

edited

Loading

thelastlin commented Aug 11, 2022 •

edited

Loading

zhongjuzhe commented Aug 24, 2022

nihui commented Oct 1, 2022

RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) #4118

RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100) #4118

Conversation

thelastlin commented Aug 5, 2022 • edited Loading

codecov-commenter commented Aug 5, 2022 • edited Loading

Codecov Report

thelastlin commented Aug 6, 2022

thelastlin commented Aug 6, 2022 • edited Loading

thelastlin commented Aug 11, 2022 • edited Loading

zhongjuzhe commented Aug 24, 2022

nihui commented Oct 1, 2022

thelastlin commented Aug 5, 2022 •

edited

Loading

codecov-commenter commented Aug 5, 2022 •

edited

Loading

thelastlin commented Aug 6, 2022 •

edited

Loading

thelastlin commented Aug 11, 2022 •

edited

Loading