-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve AArch64 depthwise convolution through smlal/smlal2 intrinsic #6711
Conversation
I like this change. May I ask two questions?
|
Hi @FrozenGene
|
For ansor, we could make ansor support tensorize (like TensorCore we need tensorize too). However, if we could done it in the llvm / tir, we will make ansor support it easily. If we could lift it as generic pass, I think it will bring benifit to other hardware platform too. |
- Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57
4a7b7c1
to
efb8ef9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now. I agree with @FrozenGene that this would be best implemented nearer the LLVM codegen for reusability, but I think this is a good start and demonstrates a worthwhile benefit to performance.
Hi @mbaret , thanks for the review! @FrozenGene , any update on this? |
…pache#6711) * Improve depthwise convolution through smlal/smlal2 intrinsic - Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57 * Address review comments * Rebasing - 2 * Rebasing - 3 * Rebasing - 3 * Fix linting
…pache#6711) * Improve depthwise convolution through smlal/smlal2 intrinsic - Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57 * Address review comments * Rebasing - 2 * Rebasing - 3 * Rebasing - 3 * Fix linting
…pache#6711) * Improve depthwise convolution through smlal/smlal2 intrinsic - Added an intrinsic to load a single int16x8 vector and produce two int32x4 output vectors through smlal/smlal2 instructions - Changed the NHWC depthwise schedule to accomodate the aforementioned intrinsic Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57 * Address review comments * Rebasing - 2 * Rebasing - 3 * Rebasing - 3 * Fix linting
Added an intrinsic to load a single int16x8 vector and produce two
int32x4 output vectors through smlal/smlal2 instructions
Changed the NHWC depthwise schedule to accomodate the aforementioned
intrinsic
Change-Id: I347c3bf98fa8dd87057304dcda0d78e558424c57