-
-
Notifications
You must be signed in to change notification settings - Fork 851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jpeg encoder optimization #1761
Jpeg encoder optimization #1761
Conversation
@br3aker Thanks for the excellent explanation. I forgot we were dealing with bits not bytes! Agreed re comments, please add more. |
src/ImageSharp/Formats/Jpeg/Components/FastFloatingPointDCT.Intrinsic.cs
Outdated
Show resolved
Hide resolved
src/ImageSharp/Formats/Jpeg/Components/FastFloatingPointDCT.Intrinsic.cs
Outdated
Show resolved
Hide resolved
Hope this passes all tests, fixed almost everything except adding comments so you can review Good news, new scalar transpose implementation is faster than the current one and does not rely on
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits left, otherwise this is good to merge:
- Address this point
- Add some comments.
- Figure out what to do with the underscores in the names.
/// Requires Avx support. | ||
/// </remarks> | ||
/// <param name="block">Input matrix.</param> | ||
public static void FDCT8x8_Avx(ref Block8x8F block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JimBobSquarePants I don't want to be the bad cop blocking the PR on the underscore stuff in these names, because I find it more readable in situations like this, but I think some StyleCop analyzer fails to kick in here.
What are your recommendations to proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscores are ok in tests, lowercase, never.
I thought underscore and uppercase after was the answer? FDCT8x8Avx
is unreadable imo, this is internal stuff only used inside 'main' FDCT method so underscore may be a good separator for simd implementations. Anyways, underscores in DCT methods were long before this PR:
public static void IDCT8x4_RightPart(ref Block8x8F s, ref Block8x8F d) |
Removing 8x8 from the name for FdctAvx
is not an option because we already have 8x4 fdct for SSE and possible future JpegXL has variable size FDCT's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no way I'm blocking this on naming. Happy to make an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think we can keep this open for a few more days, so @br3aker if you have the time and interest to address the remaining nits, good, if not we'll as is :)
@antonfirsov I will fix everything you pointed out, don't merge before it! :) |
I'm late to the party here, but I just want to say this is really great work @br3aker! |
if (Vector.IsHardwareAccelerated) | ||
{ | ||
ForwardTransform_Vector4(ref block); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: code coverage is worse than before because of this path, it simply can't check it vs sse call from remote executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes looking good. Anything else left or shall we merge?
I’m happy if you’re happy! |
I meant if @br3aker has something else in his mind. (Most likely no, but I want to go for sure.) |
I guess it's done. Thanks everyone for the contribution to this! |
@br3aker thanks again for the great work! |
@JimBobSquarePants I'm not a big fan of the following: Requires us to use admin rights to merge the PR, although there is nothing wrong with the test coverage in reality. |
Admin rights are fine IMO. It’s rare that coverage is an issue and forces us to sense check. We’re already really disciplined but I still feel that we should have rules in place. |
Fine then. Just don't want to get into position like Hungary's ruling party that changes the Constitution every time some little thing is in their way. |
I'm not sure coverage regression can be fixed, we can test different implementation explicitly in separate tests but main FDCT method with different hardware-dependent paths won't be covered. |
I tend to take the coverage report as a guide not an absolute since it at times seems wildly inaccurate. We should never chase exact coverage anyway just be aware of serious regression. The manual step helps this for me (since I get excited about perf) |
My main concern is that with this level of inaccuracy, we can get used to ignoring the check-in gates, which we should never do in case of real issues like test failures. |
Prerequisites
This PR has some performance tweaks & optimizations:
Quantization
Benchmark (it's included in the PR):
FDCT
Benchmark:
Huffman Encoding
&
Benchmark
It's really hard to test general image encoding/decoding thingy via BenchmarkDotNet so I wrote some custom code for a fixed amount of iterations (300 in following results) of encoding jpeg into MemoryStream: