Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize Scale16X16To8X8 #1517

Merged
merged 1 commit into from
Jan 21, 2021
Merged

Vectorize Scale16X16To8X8 #1517

merged 1 commit into from
Jan 21, 2021

Conversation

tkp1n
Copy link
Contributor

@tkp1n tkp1n commented Jan 21, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

In the context of #1476 I looked for low hanging fruit on the hot path of the JPEG encoding pipeline.
Using a benchmark (encoding a 4K image) with the ETW Profiler attached, I've noticed that Block8x8F.Scale16X16To8X8 is responsible for 9.8% of the total time taken. I've noticed that the method is the optimal candidate for vectorization (AVX2) and went for it.

  • The benchmarks below show a 12x improvement in Block8x8F.Scale16X16To8X8 🚀
  • This reduces the total encoding time of my sample image by ~9% from 475.9 ms to 434.1 ms

Benchmarks

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.102
  [Host] : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
  main   : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
  PR     : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT

Runtime=.NET Core 3.1  
Method Job Mean Error StdDev Ratio Gen 0 Gen 1 Gen 2 Allocated Allocated native memory Native memory leak
Scale16X16To8X8 main 189.83 ns 2.875 ns 2.689 ns 1.00 - - - - - -
Scale16X16To8X8 PR 14.97 ns 0.179 ns 0.159 ns 0.08 - - - - - -

@codecov
Copy link

codecov bot commented Jan 21, 2021

Codecov Report

Merging #1517 (4fb1859) into master (7eb5cc0) will decrease coverage by 0.02%.
The diff coverage is 89.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1517      +/-   ##
==========================================
- Coverage   83.53%   83.50%   -0.03%     
==========================================
  Files         742      742              
  Lines       32772    32801      +29     
  Branches     3669     3671       +2     
==========================================
+ Hits        27375    27392      +17     
- Misses       4680     4691      +11     
- Partials      717      718       +1     
Flag Coverage Δ
unittests 83.50% <89.65%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rc/ImageSharp/Formats/Jpeg/Components/Block8x8F.cs 90.16% <89.28%> (-2.27%) ⬇️
...mageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs 99.56% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7eb5cc0...4fb1859. Read the comment docs.

@JimBobSquarePants
Copy link
Member

Ah fantastic @tkp1n !!

I had a feeling this might be a bottle neck... I wonder, do you have screenshots of your trace we could use to identify further points of interest?

Vector256<float> c = in2;
Vector256<float> d = Unsafe.Add(ref in2, 1);

Vector256<float> calc1 = Avx.Shuffle(a, c, 0b10_00_10_00);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was a way to produce constants via SimdUtils.Shuffle.MmShuffle() in a way that could be inlined.

Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12X... Can't really ask for more than that. Fantastic stuff!

@JimBobSquarePants JimBobSquarePants merged commit 954d233 into SixLabors:master Jan 21, 2021
@antonfirsov
Copy link
Member

antonfirsov commented Jan 21, 2021

9% from such a little addition, very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants