-
-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize Scale16X16To8X8 #1517
Vectorize Scale16X16To8X8 #1517
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1517 +/- ##
==========================================
- Coverage 83.53% 83.50% -0.03%
==========================================
Files 742 742
Lines 32772 32801 +29
Branches 3669 3671 +2
==========================================
+ Hits 27375 27392 +17
- Misses 4680 4691 +11
- Partials 717 718 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Ah fantastic @tkp1n !! I had a feeling this might be a bottle neck... I wonder, do you have screenshots of your trace we could use to identify further points of interest? |
Vector256<float> c = in2; | ||
Vector256<float> d = Unsafe.Add(ref in2, 1); | ||
|
||
Vector256<float> calc1 = Avx.Shuffle(a, c, 0b10_00_10_00); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there was a way to produce constants via SimdUtils.Shuffle.MmShuffle()
in a way that could be inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12X... Can't really ask for more than that. Fantastic stuff!
9% from such a little addition, very nice! |
Vectorize Scale16X16To8X8
Prerequisites
Description
In the context of #1476 I looked for low hanging fruit on the hot path of the JPEG encoding pipeline.
Using a benchmark (encoding a 4K image) with the ETW Profiler attached, I've noticed that
Block8x8F.Scale16X16To8X8
is responsible for 9.8% of the total time taken. I've noticed that the method is the optimal candidate for vectorization (AVX2) and went for it.Block8x8F.Scale16X16To8X8
🚀Benchmarks