Add `Reader.read_row` for decoding into caller-provided buffer. #493

anforowicz · 2024-08-16T14:19:08Z

Add Reader.read_row for decoding into caller-provided buffer.

Reader.read_row is an alternative to Reader.next_row.

Reader.next_row is convenient when the caller doesn't want to manage
their own buffer, and doesn't mind having extended borrows on Reader
to accommodate Row data.
OTOH the new Reader.read_row avoids an extra copy in some scanarios
which may lead to better runtime performance - see the benchmark
results below.

The PR adds a row-by-row/128x128-4k-idat benchmark (initially based on
Reader.next_row which requires an extra copy). Using the new
Reader.read_row results in the following performance gains in the
benchmark:

time: [-16.414% -16.196% -15.969%] (p = 0.00 < 0.05)
time: [-15.570% -15.218% -14.856%] (p = 0.00 < 0.05)
time: [-16.101% -15.864% -15.629%] (p = 0.00 < 0.05)

anforowicz · 2024-08-16T14:19:49Z

This PR replaces #421. (This PR can be seen as a non-breaking-change alternative of the other, earlier PR.)

HeroicKatora

I like this approach much better as well.

src/decoder/mod.rs

Shnatsel · 2024-09-18T12:50:39Z

There doesn't seem to be any reason not to merge this. @anforowicz could you resolve the conflicts? I'll push the merge button as soon as that's done.

anforowicz · 2024-10-29T16:52:17Z

@anforowicz could you resolve the conflicts?

Ack. I'll do that later today. My apologies for missing this message earlier.

There doesn't seem to be any reason not to merge this.

Let me share some of the lingering doubts I have regarding this PR:

Cons of this PR are small, but non-zero:
- New public API (and public APIs are "forever"; or at least changing the APIs requires the friction of a new major version release and/or imposing some pain on crate users)
- A bit of an unclear story wrt keeping 1) keeping next_row / next_interlaced_row and also read_row from this PR, vs 2) deprecating next_row / next_interlaced_row and only exposing read_row in some future major version release
Pros of this PR are unclear:
- Helps to avoid an extra copy to widen the scanline in SkPngRustCodec::expandDecodedInterlacedRow, but interlaced images are relatively rare + SkPngRustCodec has other inefficiencies around processing them.
- This PR definitely helps if the output of png doesn't need any further transformations before being usable by the crate client. But... I now think that maybe this is not the case for Skia/Chromium, which requires alpha-premultiplied output. Depending on png output:
  - Indexed: can't save the extra copy / palette expansion hop (this hop is done at the Skia/Chromium layer - see https://crbug.com/356882657)
  - Rgb, Gray or GrayAlpha: can't save the extra copy (need to transform into RGBA)
  - Rgba (~54% of images from top 500 websites):
    - Even with this PR there would still be a need for an extra transformation loop for alpha-premul (and sometimes for rgba=>bgra, or for gamma/iccp). (I have my brief notes about post-decoding transforms in a doc here.)
    - Maybe this PR can save a memcpy and then apply alpha-premul and/or rgba=>bgra transformation in-place. But this is tricky, because SkSwizzler doesn't support in-place transformations, so we'd have to switch to skcms_Transform in some cases.

`Reader.read_row` is an alternative to `Reader.next_row`. * `Reader.next_row` is convenient when the caller doesn't want to manage their own buffer, and doesn't mind having extended borrows on `Reader` to accommodate `Row` data. * OTOH the new `Reader.read_row` avoids an extra copy in some scanarios which may lead to better runtime performance - see the benchmark results below. The commit results in the followin performance gains seen in the recently introduced `row-by-row/128x128-4k-idat` benchmark: time: [-16.414% -16.196% -15.969%] (p = 0.00 < 0.05) time: [-15.570% -15.218% -14.856%] (p = 0.00 < 0.05) time: [-16.101% -15.864% -15.629%] (p = 0.00 < 0.05)

anforowicz mentioned this pull request Aug 16, 2024

Remove Reader::scrach_buffer field + resulting breaking API changes. #421

Open

kornelski approved these changes Aug 16, 2024

View reviewed changes

HeroicKatora reviewed Aug 17, 2024

View reviewed changes

src/decoder/mod.rs Outdated Show resolved Hide resolved

anforowicz force-pushed the read-row branch from d4e03c3 to 03f36ed Compare August 23, 2024 16:04

Shnatsel mentioned this pull request Sep 27, 2024

Tracking issue for semver-breaking v0.18 #508

Open

anforowicz added 2 commits October 29, 2024 16:54

Benchmark of next_row-based decoding.

58c71b4

anforowicz force-pushed the read-row branch from 03f36ed to 51a3d30 Compare October 29, 2024 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Reader.read_row` for decoding into caller-provided buffer. #493

Add `Reader.read_row` for decoding into caller-provided buffer. #493

anforowicz commented Aug 16, 2024

anforowicz commented Aug 16, 2024

HeroicKatora left a comment

Shnatsel commented Sep 18, 2024

anforowicz commented Oct 29, 2024

Add Reader.read_row for decoding into caller-provided buffer. #493

Are you sure you want to change the base?

Add Reader.read_row for decoding into caller-provided buffer. #493

Conversation

anforowicz commented Aug 16, 2024

anforowicz commented Aug 16, 2024

HeroicKatora left a comment

Choose a reason for hiding this comment

Shnatsel commented Sep 18, 2024

anforowicz commented Oct 29, 2024

Add `Reader.read_row` for decoding into caller-provided buffer. #493

Add `Reader.read_row` for decoding into caller-provided buffer. #493