Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897

elvin-n · 2021-09-01T11:07:59Z

Extend the list of different target for x86 topi
Extend tests for conv2d x86 int8 for fast i8 x86 platforms

this change in theory can give up to 2x speedup on int8 models vs fp32 models, currently slightly less

Resnet50 performance:

	Core i7-1185G7 sse4	Core i7-1185G7 avx2	Core i7-1185G7 avx512	Core i7-1185G7 VNNI	Core i7-8700B	Core i5-9400T
	FPS	FPS	FPS	FPS	FPS	FPS
TVM FP32		53	53	53	54	48
TVM int32		12			16
TVM int8 default	34	61	92	142	78	62
TVM int8 atvm		70		134	95	79

- Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms

python/tvm/topi/x86/utils.py

tests/python/relay/test_op_level2.py

jcf94

Thanks for your answer! @elvin-n

elvin-n · 2021-09-03T10:29:32Z

The change in get_fp32_len affected ARM flow - now it started to block by 4 instead previous default 8. It must not affect from performance point of view since NEON SIMD vector size is 64 or 128 bit, but will affect the knowledge database of tuned kernels.

Will verify the performance aspect on ARM. As for backward compatibility - still open question. So far I have an impression that we do not care about it so much.

elvin-n · 2021-09-03T14:37:52Z

I verified ARM flow and confirm that it started to use 4 channel values instead of 8 for blocking and this fact did not affect performance anyhow (as i expected)

…pache#8897) * Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd - Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms * fix code style * Change x86-64-v2 to nahalem in test to support llvm11 * Change test target to get NCHW8c

Add sse4/avx2 support for vpmaddubsw/vpmaddwd/vpaddd

ff61cc9

- Extend the list of different target for x86 topi - Extend tests for conv2d x86 int8 for fast i8 x86 platforms

elvin-n requested review from anijain2305, areusch, comaniac, Huyuwei, jcf94, jroesch, junrushao, jwfromm, kevinthesun, Laurawly, masahi, mbrookhart, merrymercy, tqchen, vinx13, yzhliu and ZihengJiang as code owners September 1, 2021 11:07

elvin-n added 3 commits September 1, 2021 14:20

fix code style

7da5b4c

Change x86-64-v2 to nahalem in test to support llvm11

0742047

Change test target to get NCHW8c

051f9cc

vinx13 approved these changes Sep 2, 2021

View reviewed changes

jcf94 reviewed Sep 3, 2021

View reviewed changes

python/tvm/topi/x86/utils.py Show resolved Hide resolved

tests/python/relay/test_op_level2.py Show resolved Hide resolved

jcf94 approved these changes Sep 3, 2021

View reviewed changes

masahi merged commit 1bebd0a into apache:main Sep 9, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897

elvin-n commented Sep 1, 2021 •

edited

Loading

jcf94 left a comment

elvin-n commented Sep 3, 2021 •

edited

Loading

elvin-n commented Sep 3, 2021

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897

Add sse4/avx2 support for fast x86 int8 (vpmaddubsw/vpmaddwd/vpaddd) #8897

Conversation

elvin-n commented Sep 1, 2021 • edited Loading

jcf94 left a comment

Choose a reason for hiding this comment

elvin-n commented Sep 3, 2021 • edited Loading

elvin-n commented Sep 3, 2021

elvin-n commented Sep 1, 2021 •

edited

Loading

elvin-n commented Sep 3, 2021 •

edited

Loading