Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimised 'Indirect BGEMM' binary convolution kernels. #516

Merged
merged 1 commit into from
Sep 29, 2020

Conversation

AdamHillier
Copy link
Contributor

What do these changes do?

This PR adds a new type of binary convolution kernel that uses an 'indirect' BGEMM algorithm that doesn't require im2col. This is an adaptation of the algorithm introduced in the paper The Indirect Convolution Algorithm and used extensively in the XNNPack library.

Only one BGEMM-micro kernel is included: a portable 4x2 kernel written in C++. However, this PR lays the groundwork for adding additional micro-kernels -- including hand-optimised and architecture-specific variations -- in the future. As such, the focus of this PR is not performance; the new kernel will be substantially slower than our existing highly-optimised im2col + BGEMM kernel, which will remain the default.

How Has This Been Tested?

CI. The non-CI 'big' kernel tests pass locally for Aarch64 and Arm32.

Benchmark Results

Benchmarks aren't really relevant because this PR does not change the default optimised kernel that is run, but to give a rough idea I ran QuickNet on my Raspberry Pi 4B with our three different kernel types:

Kernel Average latency over 250 runs (ms)
Optimised im2col + BGEMM (hand-tuned assembly) 30.0
Optimised indirect BGEMM (C++) - this PR 128.8
Reference (C++) 269.5

Related issue number

N/A.

@AdamHillier AdamHillier requested a review from a team September 23, 2020 00:36
Copy link
Collaborator

@Tombana Tombana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@lgeiger lgeiger added the internal-improvement Internal Improvements and Maintenance label Sep 24, 2020
To start, add portable 4x2 C++ kernels for float/int8/bitpacked
output. Facilitate easy implementation of new indirect bgemm
kernels, including architecture-specific variations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal-improvement Internal Improvements and Maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants