[MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op #14408

ibsidorenko · 2023-03-27T13:57:41Z

Motivation:
It was found that for standalone elementwise operations (add, sub, etc.) MetaScheduler generates code with poor performance due to lack of vector code on some input tensor shapes. Current implementation is not able to vectorize if innermost loops extent is not multiple of the vector length.

What was done:
Core changes: it checks current loops nest, if all loops are "simple", i.e. loops without annotations, bindings, reduce axis, then it does the following:

Fuse all loops into single one.
Split this new loop into 2 parts: inner and outer. Herewith split factor for the inner loop is equal to 'max_vectorize_extent' MetaScheduler parameter.
Parallelize outer loop and vectorize inner loop.

Performance measurement:
Measurement was done on Qualcomm Snapdragon 888. As it was expected, 1 and 2 got significant performance boost, 3 and 4 - without changes.

N	op	Dtype	Shape	Before fix, ms	After fix, ms	speedup
1	add	uint8	1, 8, 56, 56, 32	1.264	0.167	7.5x
2	qnn.add	uint8	1, 8, 56, 56, 32	2.213	0.336	6.6x
3	add	int32	1, 8, 56, 56, 32	0.161	0.150	1.07x
4	seq*	uint8	1, 64, 56, 56	2.634	2.679	0.98x

seq* - test of the ops sequence: qnn.conv2d + bias_add + qnn.requantize,
weights shape = [256, 64, 1, 1]

…ise ops Motivation: It was found that for standalone elementwise operations (add, sub, etc.) MetaScheduler generates code with poor performance due to lack of vector code on some input tensor shapes. Current implementation is not able to vectorize if innermost loops extent is not multiple of the vector length. What was done: Core changes: it checks current loops nest, if all loops are "simple", i.e. loops without annotations, bindings, reduce axis, then it does the following: 1) Fuse all loops into single one. 2) Split this new loop into 2 parts: inner and outer. Herewith split factor for the inner loop is equal to 'max_vectorize_extent' MetaScheduler parameter. 3) Parallelize outer loop and vectorize inner loop. Performance measurement: Measurement was done on Qualcomm Snapdragon 888. As it was expected, 1 and 2 got significant performance boost, 3 and 4 - without changes. N | op | Dtype | Shape | Before fix, ms | After fix, ms | speedup | --|---------|-------|------------------|----------------|---------------|---------| 1 | add | uint8 | 1, 8, 56, 56, 32 | 1.264 | 0.167 | 7.5x | 2 | qnn.add | uint8 | 1, 8, 56, 56, 32 | 2.213 | 0.336 | 6.6x | 3 | add | int32 | 1, 8, 56, 56, 32 | 0.161 | 0.150 | 1.07x | 4 | seq* | uint8 | 1, 64, 56, 56 | 2.634 | 2.679 | 0.98x | ----------------------------------------------------------------------------------| seq* - test of the ops sequence: qnn.conv2d + bias_add + qnn.requantize, weights shape = [256, 64, 1, 1]

tvm-bot · 2023-03-27T13:57:46Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: metaschedule, hexagon _{See #10317 for details}

_{Generated by tvm-bot}

ibsidorenko · 2023-03-27T15:28:28Z

@tvm-bot rerun

masahi approved these changes Mar 28, 2023

View reviewed changes

masahi merged commit 14ddb37 into apache:main Mar 28, 2023

ibsidorenko deleted the ms-vectorizer branch March 28, 2023 12:12

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op #14408

[MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op #14408

ibsidorenko commented Mar 27, 2023

tvm-bot commented Mar 27, 2023

ibsidorenko commented Mar 27, 2023

[MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op #14408

[MetaSchedule][Hexagon] Improve vectorization for standalone elementwise op #14408

Conversation

ibsidorenko commented Mar 27, 2023

tvm-bot commented Mar 27, 2023

ibsidorenko commented Mar 27, 2023