Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jpeg encoding code optimization #1632

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
7229dbf
Block8x8F explicit layout & 256bit rows support
May 18, 2021
fbf0ff1
Block8x8F.MultiplyInPlace no longer use unsafe casts
May 18, 2021
20236b8
Block8x8F.TransposeInto no longer uses unsafe casts (partially)
May 18, 2021
e5188fe
Implemented FDCT8x8 using avx instruction set, added backward compati…
May 18, 2021
513e86a
Implemented IDCT algorithm with avx/fma, move IDCT code to a differen…
May 18, 2021
81c21e5
Fixed "constant" vectors naming
May 18, 2021
9bf9644
RgbToYCbCrConverterLut.Convert main loop routine now uses named const…
May 20, 2021
347ac36
LuminanceForwardConverter.Convert main loop routine now uses named co…
May 20, 2021
86a6d8b
WriteDefineHuffmanTables(...) no longer relies on external buffer for…
May 20, 2021
f001755
[WIP] Partially moved encoding logic to a separate class
May 20, 2021
d91fc40
Removed write buffer parameter injection
May 21, 2021
66b5a8d
[WIP] Moved SOS writing logic to separate class
May 21, 2021
0d7e4b1
Removed unrelevant code from JpegDecoderCore
May 21, 2021
d593479
Removed remaining unrelevant code from JpegEncoderCore
May 21, 2021
296ee10
Optimized jpeg encoder stream Write calls but a lot -> huge performan…
May 21, 2021
56822d1
Removed obsolete parameter config from various methods
May 21, 2021
690e80c
YCbCrEncoder now has builtin temporal 8x8F blocks for internal calcul…
May 21, 2021
b3a9938
Updated & fixed xml documentation
May 21, 2021
4e73471
Small QoL fixes
May 21, 2021
368f89e
Moved quantization table initialization logic to JpegEncoderCore
May 21, 2021
9d7adb6
Fixed comments
May 21, 2021
3380bdf
Renamed YCbCrEncoder to HuffmanScanEncoder as it is in decoding logic
May 21, 2021
7e0a317
Moved encode method choice to the JpegEncoderCore
May 21, 2021
1b1d136
Fixed unresolved reference this.colorType
May 21, 2021
5b05a0a
Added QoL throw helper method for jpeg w/h size check before encoding
May 21, 2021
84a143d
Moved end of image marker writing code to a separate method
May 21, 2021
d4fa8b2
Rolled back to initial JpegEncoderCore options implementation.
May 22, 2021
980f2d2
Revert "Block8x8F.MultiplyInPlace no longer use unsafe casts"
May 22, 2021
f1886ad
Revert "Block8x8F.TransposeInto no longer uses unsafe casts (partially)"
May 22, 2021
a8f717d
Made DCT code prettier with SimdUtils, added summary to 8x8 dct metho…
May 22, 2021
dfb181d
Combined FDCT and IDCT code into single file
May 22, 2021
0424d8d
Codestyle changes
May 22, 2021
855f109
Merge branch 'master' into jpeg-decode-encode-optimization
May 22, 2021
d12bb3e
Improved jpeg encoding benchmark, updated benchmark 'baseline' for cu…
May 24, 2021
ae85722
Simplified WriteDefineHuffmanTables method
May 24, 2021
a65e503
Added MultiplySubstract method to the HwIntrinsics
May 25, 2021
86abb73
Made FDCT8x8_Avx(...) method prettier with SimdUtils
May 25, 2021
0664f29
Replaced bit count lookup table to lzcnt implementation, Added Minimi…
May 26, 2021
f354c1c
Merge branch 'master' into jpeg-decode-encode-optimization
May 26, 2021
64371bd
Merge branch 'jpeg-decode-encode-optimization' of https://github.com/…
May 26, 2021
28ea2ad
Fixed comments, removed todo, updated benchmark results
May 26, 2021
d251003
Implemented fallback code for runtimes where BitOperations class is n…
May 26, 2021
ceb4fdf
Replaced unsafe Block8x8F/Vector4<float> -> Vector256<float> casts
May 27, 2021
70474c8
Removed redundant enum casting durint huffman encoding
May 27, 2021
52e6036
Reimplemented Emit methods in HuffmanScanEncoder to get rid of unread…
May 27, 2021
7fb8fee
Fixed xml docs
May 27, 2021
d7fd947
Updated default quality settings in jpeg encoding benchmark
May 27, 2021
81979e0
Improved flush logic after main encode methods run
May 27, 2021
1684249
Brought back if check
May 27, 2021
9c0999e
Huffman lookup tables are now integers instead of unsigned integers
May 27, 2021
169e98b
Simplified Block8x8F.DivideRoundAll() method
May 28, 2021
6ac2b66
Added comments to vectorized rgb->ycbcr converter for further code ch…
May 29, 2021
a845c00
Simplified RgbToYCbCrConverterVectorized.Convert() method
May 29, 2021
2ad3ddb
[WIP] Introduced RgbToYCbCrConverterVectorized 420 sampling
May 29, 2021
201c534
Fixed HuffmanScanEncoder error
May 29, 2021
8a77496
Imporved internal rgb -> rcbcr conversion api for 420 subsampling
May 29, 2021
052ebde
Replaced GenericBlocl8x8 with Span in ycbcr converter
May 29, 2021
d50e255
[WIP] Implemented 16x8 420 subsampling convertion
May 29, 2021
5ed7e2d
Added quality params to the jpeg encoder benchmark
May 29, 2021
d6db6b6
Fixed compilation errors for non-intrinsic platforms
May 29, 2021
3956986
Added debug guard checks to LoadAndStretchEdges
May 30, 2021
0d94435
Simplified LoadAndStretchEdges call logic
May 30, 2021
13e7cf3
Divided YCbCr converters into 444/420 subsampling categories
May 30, 2021
12b4b83
444 converter fixes
May 30, 2021
953095f
420 converter fixes
May 30, 2021
5fc29a2
Introduced separate 420 converter
May 30, 2021
cb1acae
Finished 420 subsampling converter
May 30, 2021
672da45
Finished 444 subsampling converter
May 30, 2021
17fcc89
Merge branch 'master' into pr/1632
JimBobSquarePants Jun 2, 2021
1d54702
Update shared-infrastructure
JimBobSquarePants Jun 2, 2021
0b7f95f
Merge branch 'master' into pr/1632
JimBobSquarePants Jun 2, 2021
5ea8da6
Fix BitOperations
JimBobSquarePants Jun 2, 2021
de176b6
Initial 420 subsampling lut conversion implementation
Jun 3, 2021
7896e24
Improved non-simd ycbcr lut converter code
Jun 4, 2021
2e25a3e
Optimized non-simd ycbcr lut converter code
Jun 4, 2021
44bae0b
Made non-simd ycbcr lut converter code more readable
Jun 4, 2021
078703b
Added docs, renamed LuT converter for 444 and 420 subsampling methods…
Jun 4, 2021
da1b85b
Final cleanup of the non-simd 420 rgb -> ycbcr conversion code
Jun 4, 2021
05ea9c2
Merge branch 'jpeg-decode-encode-optimization' of https://github.com/…
Jun 4, 2021
25437ad
Merge branch 'convert420-performance' into jpeg-decode-encode-optimiz…
Jun 4, 2021
7135fc7
Renamed MinimumBitsToStore16 method as it only works with up to 16 bi…
Jun 5, 2021
743e34c
Fixed stream flush for jpeg encoder
Jun 5, 2021
01f44a8
Renamed vectorized rgb -> ycbcr converter for 444 subsampling
Jun 5, 2021
fcf202a
Added tests for 420 rgb -> ycbcr subsampling
Jun 5, 2021
ad333f6
Simplified Lut implementation
Jun 6, 2021
0e053f0
Optimized 420 converter with higher precision
Jun 7, 2021
2d54226
Both converters code cleanup
Jun 7, 2021
2949145
Fixed failing tests output
Jun 7, 2021
8f79eb9
Converters tests/code cleanup, added comments for padding property
Jun 7, 2021
b1a2126
Added docs
Jun 7, 2021
2edb1a8
Removed obsolete code
Jun 7, 2021
0aecbd0
Removed unused usings
Jun 7, 2021
a4222fd
Added DCT tests
Jun 7, 2021
8a61048
Fixed DCT tests
Jun 7, 2021
4d9cb82
Merge branch 'master' into jpeg-decode-encode-optimization
antonfirsov Jun 7, 2021
b9b853b
Added docs & stylecop fixes
Jun 7, 2021
8d321a5
Added DCT tests paths for nosimd/avx/avx+fma
Jun 7, 2021
0e07a8e
Removed obsolete code
Jun 7, 2021
0013c54
Optimized vector rgb pixel matrix scaling
Jun 9, 2021
35daf21
Added tests for vector rgb pixel matrix scaling
Jun 10, 2021
121d1fa
Fixed build error due to invalid using
Jun 10, 2021
20a0d84
Moved jpeg matrix scaler to jpeg converter
Jun 10, 2021
6d4e2ee
Moved jpeg converter scaler tests to to jpeg converter tests
Jun 10, 2021
ce1d992
Fixed invalid curly braces, added debug Avx2 check
Jun 10, 2021
8bbcd65
Improved benchmark for jpeg encoder
Jun 10, 2021
ab8ed08
Updated benchmark results
Jun 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 44 additions & 1 deletion src/ImageSharp/Common/Helpers/Numerics.cs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,28 @@ internal static class Numerics
private const int ShuffleAlphaControl = 0b_11_11_11_11;
#endif

#if !SUPPORTS_BITOPERATIONS
/// <summary>
/// Gets the counts the number of bits needed to hold an integer.
/// </summary>
private static ReadOnlySpan<byte> BitCountLut => new byte[]
{
0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8,
};
#endif

/// <summary>
/// Determine the Greatest CommonDivisor (GCD) of two numbers.
/// </summary>
Expand Down Expand Up @@ -756,7 +778,7 @@ public static float Lerp(float value1, float value2, float amount)
/// widening them to 32-bit integers and performing four additions.
/// </summary>
/// <remarks>
/// <code>byte(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)</code>
/// <c>byte(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)</c>
/// is widened and added onto <paramref name="accumulator"/> as such:
/// <code>
/// accumulator += i32(1, 2, 3, 4);
Expand Down Expand Up @@ -825,5 +847,26 @@ public static int EvenReduceSum(Vector256<int> accumulator)
return Sse2.ConvertToInt32(vsum);
}
#endif

/// <summary>
/// Calculates how many minimum bits needed to store given value.
/// </summary>
/// <param name="number">Unsigned integer to store</param>
/// <returns>Minimum number of bits needed to store given value</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int MinimumBitsToStore16(uint number)
{
#if !SUPPORTS_BITOPERATIONS
if (number < 0x100)
{
return BitCountLut[(int)number];
}

return 8 + BitCountLut[(int)number >> 8];
#else
const int bitInUnsignedInteger = sizeof(uint) * 8;
return bitInUnsignedInteger - BitOperations.LeadingZeroCount(number);
#endif
}
}
}
25 changes: 25 additions & 0 deletions src/ImageSharp/Common/Helpers/SimdUtils.HwIntrinsics.cs
Original file line number Diff line number Diff line change
Expand Up @@ -532,6 +532,7 @@ private static void Shuffle4Slice3(
/// <summary>
/// Performs a multiplication and an addition of the <see cref="Vector256{T}"/>.
/// </summary>
/// <remarks>ret = (vm0 * vm1) + va</remarks>
/// <param name="va">The vector to add to the intermediate result.</param>
/// <param name="vm0">The first vector to multiply.</param>
/// <param name="vm1">The second vector to multiply.</param>
Expand All @@ -552,6 +553,30 @@ public static Vector256<float> MultiplyAdd(
}
}

/// <summary>
/// Performs a multiplication and a substraction of the <see cref="Vector256{T}"/>.
/// </summary>
/// <remarks>ret = (vm0 * vm1) - vs</remarks>
/// <param name="vs">The vector to substract from the intermediate result.</param>
/// <param name="vm0">The first vector to multiply.</param>
/// <param name="vm1">The second vector to multiply.</param>
/// <returns>The <see cref="Vector256{T}"/>.</returns>
[MethodImpl(InliningOptions.ShortMethod)]
public static Vector256<float> MultiplySubstract(
in Vector256<float> vs,
in Vector256<float> vm0,
in Vector256<float> vm1)
{
if (Fma.IsSupported)
{
return Fma.MultiplySubtract(vm1, vm0, vs);
}
else
{
return Avx.Subtract(Avx.Multiply(vm0, vm1), vs);
}
}

/// <summary>
/// <see cref="ByteToNormalizedFloat"/> as many elements as possible, slicing them down (keeping the remainder).
/// </summary>
Expand Down
Loading