chgemm

chgemm is an symmetric int8 project, which is slightly different from BLAS sgemm:

when you input an int8_t type of matrix [-127,+127], you will get an int32_t one. PS: pay attention to the overflow;
considering the application scene of the deeep learning, the packAB interface is open and can be adjusted;
the common design plan is alpha*A*B+beta*C=C, but mine is C=A*B, because they have no utility in deep learning inference;
row major;
the speed of this project is quicker than any other projects'.

chgemm 是一个 int8 gemm 工程，与 BLAS gemm 不完全相同：

输入为 [-127, +127] 范围内的 int8_t 类型矩阵，输出为 int32_t 矩阵。需注意溢出；
更多地为深度学习应用场景考虑，packAB 接口暴露出来可以调整；
实现为 C = A * B。alpha 和 beta 在深度学习推理中无实用意义；
行主序实现，放弃远古 fortran 时代的列主序；
不低于其他项目的 symmint8 gemm 速度。

test result

Compiled on RK3399 with -O3 flag. The current peek can be 18.6 gflops, and the orange line is the single-core fp32 limit(14.3 gflops).

速度

-O3 编译，目前在 rk3399 单核结果。目前极限可以到 18.6 gflops，橙线是 rk3399 单核 fp32 极限。在 aws A72 单核测试约 23 gflops，是此实现方法的极限（发挥 100% 性能）。

使用方式

修改makefile中的OLD和NEW挑选不同实现方式。首次运行需要OLD和NEW是同一个
make run 即输出速度结果
parameters.h可修改测试参数

集成方式

参照 MMult_4x8_21.c 调用矩阵乘法，将代码嵌入到自己的项目中。可根据推理库的实现做相应修改。

application with chgemm inside

chgemm is pleased to support ncnn available, check gemm_symm_int8.h.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
android/MyApplication		android/MyApplication
.travis.yml		.travis.yml
0.png		0.png
MMult_4x16_18.c		MMult_4x16_18.c
MMult_4x16_19.c		MMult_4x16_19.c
MMult_4x16_20.c		MMult_4x16_20.c
MMult_4x8_21.c		MMult_4x8_21.c
MMult_4x8_22.c		MMult_4x8_22.c
README.md		README.md
REF_MMult.c		REF_MMult.c
compare_matrices.c		compare_matrices.c
copy_matrix.c		copy_matrix.c
dclock.c		dclock.c
int8kernel_m1.S		int8kernel_m1.S
int8kernel_m1_requant.S		int8kernel_m1_requant.S
int8kernel_m2.S		int8kernel_m2.S
int8kernel_m2_requant.S		int8kernel_m2_requant.S
int8kernel_m4.S		int8kernel_m4.S
int8kernel_m4_requant.S		int8kernel_m4_requant.S
kernel_m4n4k16.S		kernel_m4n4k16.S
makefile		makefile
parameters.h		parameters.h
plot.py		plot.py
print_matrix.c		print_matrix.c
public.h		public.h
random_matrix.c		random_matrix.c
reorder_a.S		reorder_a.S
reorder_b.S		reorder_b.S
reorder_b.h		reorder_b.h
test_MMult.c		test_MMult.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chgemm

test result

速度

使用方式

集成方式

application with chgemm inside

About

Releases

Packages

Contributors 2

Languages

tpoisonooo/chgemm

Folders and files

Latest commit

History

Repository files navigation

chgemm

test result

速度

使用方式

集成方式

application with chgemm inside

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages