JpegDecoder: post-process baseline spectral data per MCU-row #1694

br3aker · 2021-07-12T00:20:29Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This also fixes #1692 simply because tests were failing with new architecture.

Performance

No change. Baseline images should obviously be faster on the global scale due to lower memory footprint but in local benchmarks there's no change. And it's easy to explain: actual decoding code wasn't changed, only memory allocation and temporal buffers management.

Note: PR benchmark is a bit faster than master branch (it's even faster than that because of a virtual call per each image stride). But I recall this false positive due to a really small image size. Most likely it's due to lesser memory footprint of spectral blocks allocation. I assume that at 4k resolutions there should be a little to no difference between them as actual code behind parsing/decoding wasn't touched.

Method	TestImage	Mean	Error	StdDev	Ratio
JpegDecoderCore.ParseStream master	Jpg/b(...)e.jpg [21]	3.680 ms	0.0152 ms	0.0143 ms	1.00
JpegDecoderCore.ParseStream PR	Jpg/b(...)e.jpg [21]	3.474 ms	0.0196 ms	0.0174 ms	0.94

Unfortunately, I don't have access to jetbrains memory profiler so I had a lot simplier check - custom debug allocator which simply prints all allocations. Here are dumps from the same image with same subsampling but different save methods - baseline and progressive:

All small byte allocations are removed for clarity

Progressive

Allocation of Rgba32[311400]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Vector4[519]
Allocation of Block8x8[5016]
Allocation of Block8x8[1254]
Allocation of Block8x8[1254]

Baseline

Allocation of Rgba32[311400]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Single[8448]
Allocation of Vector4[519]
Allocation of Block8x8[132] <- yay!
Allocation of Block8x8[33]  <- yay!
Allocation of Block8x8[33]  <- yay!

P.S. If somebody has some free time and access to jetbrains memory profiler - this is super welcome for the full picture!

Tests & Legacy code

Everything is up and running. Old pipeline is removed.

TODO

Fix existing tests
Tests for SpectralConverter<TPixel>
Remove legacy post processing code

br3aker · 2021-07-13T10:59:35Z

@JimBobSquarePants found an error in jpeg decoder reference output files which was even marked as a bug:

// BUG: The following image has a high difference compared to the expected output: 1.0096%
TestImages.Jpeg.Baseline.Jpeg420Small,

I've re-saved actual jpeg to png using photoshop and got this difference from my file and existing png file:

Those differences are actually visible by human eye with proper zoom in photoshop, looks like edges were anti-aliased for some reason.

With this fix image passes tolerance test:

*** Jpg/baseline/jpeg420small.jpg ***
Difference: 0,2863%

I'm not sure how I can update it, can you elaborate please?

JimBobSquarePants · 2021-07-13T11:08:14Z

Awesome! It should be a case of replacing the reference image in this folder with the actual correct output from your tests as long as you have got Git LFS installed (see the readme for instructions)

https://github.com/SixLabors/ImageSharp/tree/master/tests/Images/External/ReferenceOutput/JpegDecoderTests

br3aker · 2021-07-13T11:09:11Z

Thanks!

…eSharp into jpeg-decoder-memory

JimBobSquarePants

This is looking really great so far!

src/ImageSharp/Formats/Jpeg/Components/Decoder/SpectralConverter.cs

tests/ImageSharp.Benchmarks/Codecs/Jpeg/DecodeJpegParseStreamOnly.cs

tests/ImageSharp.Tests.ProfilingSandbox/Program.cs

br3aker · 2021-07-13T15:27:41Z

Update

Added integer ceiled division (tests included)
Fixed parse stream only benchmark and updated results for PR implementation (see top comment for comparison)
Fixed (last) invalidated test which saves decoded spectral blocks
Added docs to the SpectralConverter class
Restored Sandbox code

JimBobSquarePants

Really well done! This is an excellent piece of refactoring!

I can't believe you managed to solve that longstanding issue also! 😍

antonfirsov · 2021-07-24T14:46:52Z

I did some WPA profiling on this using my 10 Core i9-10900X, and the results are quite impressive, although for some reason we didn't get the 8x memory footprint drop I expected. (Miscalculated the memory footprint of blocks in comparison to the image data?)

1. `LoadResizeSaveParallelMemoryStress` with `MaxDegreeOfParallelism = 8` and `Filter = JpegKind.Baseline`

Before the PR

Peak around 6.4 GB, total processing time 15.03 seconds.

Current master

Peak around 3.2 GB, total processing time 13.73 seconds.

2. `LoadResizeSaveParallelMemoryStress` with `MaxDegreeOfParallelism = 20` and `Filter = JpegKind.Any`

Before the PR

Peak around 13 GB.

Current master

Peak around 4.4 GB.

Great job @br3aker, thanks for the contribution!
/cc @JimBobSquarePants

JimBobSquarePants · 2021-07-25T12:34:49Z

Those numbers 😘

Dmitry Pentin added 30 commits July 7, 2021 23:38

Renamed pixel dimensions for JpegFrame

6f6ee73

Added debug code to the sandbox

925b3ad

Injected progressive scan parameters

2f8d3c9

Injected scan selectors count

336c64a

Injected frame & reset interval

3b2d2d8

Injected huffman tables

5d44503

Scan decoder is not a persistent state of the decoder core

442af2c

Added comments for future refactoring

b7d54b1

Added extra comment

7044741

Jpeg frame is now injected to the scan decoder at the SOF marker

9b81724

Slight change to image post processor for better understanding

7e1bd59

Replaced hardcoded values with actual calculated ones in postprocessor

5ba8763

WIP spectral converter

22af241

Fixed iteration variables

dbe4c4e

Added todo(s)

887c0ba

Decoupled image processor from component processor

6b2f189

Fixed converter frame injection

dc4bc6e

Added getter which converts pixels to PixelBuffer property

d178c8c

Fixed sandbox code

c86d029

Wired up converter & scan decoder

1dbb16a

Added Buffer2D Image ctor, wired new post processor with decoder core

1c10ec6

Implemented disposable pattern for spectral converter

1348ecf

Implemented step-based iteration for spectralconverter

1d4dd08

Added separate step parameter for spectral data enumeration

fa0aaec

Added external way to mark convesion finished

c017357

Added initial support for the baseline interleaved stride conversion

3c59cd9

Sandbox changes

460b02c

Fixed invalid baseline jpeg decoding

e39adf8

Rolled back to counter enumeration for spectral converter

7afca19

Refactored scan converter

4b5f0f6

Fixed baseline image invalid reference output png image

2e5b0ad

Dmitry Pentin added 6 commits July 13, 2021 14:11

Removed post processor tests

005fff7

Removed post processor from jpeg decoder

c6a2c6b

Added new tolerance to the Jpeg420Small test image

865c706

Merge branch 'master' into jpeg-decoder-memory

4ff282e

Fixed docs

8b6ad9c

Merge branch 'jpeg-decoder-memory' of https://github.com/br3aker/Imag…

19b65ab

…eSharp into jpeg-decoder-memory

JimBobSquarePants reviewed Jul 13, 2021

View reviewed changes

src/ImageSharp/Formats/Jpeg/Components/Decoder/SpectralConverter.cs Show resolved Hide resolved

tests/ImageSharp.Benchmarks/Codecs/Jpeg/DecodeJpegParseStreamOnly.cs Outdated Show resolved Hide resolved

tests/ImageSharp.Tests.ProfilingSandbox/Program.cs Outdated Show resolved Hide resolved

Dmitry Pentin added 6 commits July 13, 2021 15:58

Restored memory stress test to the sandbox

84900dc

Restored decoder parse stream only benchmark

82e22c3

Updated StreamParseOnly benchmark

0c27adc

Added docs

2eaa2d5

Added DivideCeil

13c3a45

Fixed spectral data as image saving test

269c073

br3aker marked this pull request as ready for review July 13, 2021 15:27

Disabled image saving test

190964c

JimBobSquarePants approved these changes Jul 15, 2021

View reviewed changes

JimBobSquarePants merged commit 61b137d into SixLabors:master Jul 15, 2021

br3aker deleted the jpeg-decoder-memory branch July 15, 2021 14:23

antonfirsov mentioned this pull request Jul 24, 2021

More efficient MemoryAllocator #1596

Closed

br3aker mentioned this pull request Aug 22, 2021

BGRA Jpg incorrectly decoded. #1746

Closed

4 tasks

antonfirsov mentioned this pull request Oct 20, 2021

Post IDCT rounding needs to go - it induces noise & generational loss and takes extra time to compute #1782

Open

antonfirsov mentioned this pull request Nov 25, 2021

Unmanaged pooling MemoryAllocator #1730

Merged

10 tasks

br3aker mentioned this pull request Feb 5, 2022

Some old jpg images are saved with glitches (regression in alpha) #1978

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

JpegDecoder: post-process baseline spectral data per MCU-row #1694

JpegDecoder: post-process baseline spectral data per MCU-row #1694

Uh oh!

br3aker commented Jul 12, 2021 •

edited

Loading

Uh oh!

br3aker commented Jul 13, 2021 •

edited

Loading

Uh oh!

JimBobSquarePants commented Jul 13, 2021

Uh oh!

br3aker commented Jul 13, 2021

Uh oh!

JimBobSquarePants left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

br3aker commented Jul 13, 2021 •

edited

Loading

Uh oh!

JimBobSquarePants left a comment •

edited

Loading

Uh oh!

antonfirsov commented Jul 24, 2021 •

edited

Loading

Uh oh!

JimBobSquarePants commented Jul 25, 2021

Uh oh!

Uh oh!

Uh oh!

JpegDecoder: post-process baseline spectral data per MCU-row #1694

JpegDecoder: post-process baseline spectral data per MCU-row #1694

Uh oh!

Conversation

br3aker commented Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Description

Performance

Tests & Legacy code

TODO

Uh oh!

br3aker commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JimBobSquarePants commented Jul 13, 2021

Uh oh!

br3aker commented Jul 13, 2021

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

br3aker commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JimBobSquarePants left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antonfirsov commented Jul 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. LoadResizeSaveParallelMemoryStress with MaxDegreeOfParallelism = 8 and Filter = JpegKind.Baseline

Before the PR

Current master

2. LoadResizeSaveParallelMemoryStress with MaxDegreeOfParallelism = 20 and Filter = JpegKind.Any

Before the PR

Current master

Uh oh!

JimBobSquarePants commented Jul 25, 2021

Uh oh!

Uh oh!

br3aker commented Jul 12, 2021 •

edited

Loading

br3aker commented Jul 13, 2021 •

edited

Loading

br3aker commented Jul 13, 2021 •

edited

Loading

JimBobSquarePants left a comment •

edited

Loading

antonfirsov commented Jul 24, 2021 •

edited

Loading

1. `LoadResizeSaveParallelMemoryStress` with `MaxDegreeOfParallelism = 8` and `Filter = JpegKind.Baseline`

2. `LoadResizeSaveParallelMemoryStress` with `MaxDegreeOfParallelism = 20` and `Filter = JpegKind.Any`