Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap buffer sizes via ZlibStream::set_max_total_output. #429

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

anforowicz
Copy link
Contributor

PTAL? This results in quite a significant performance improvement for the noncompressed-8x8.png benchmark (bigger improvement than #427 and #428 combined - I am surprised).

@anforowicz
Copy link
Contributor Author

Hmmm... I guess I need to look into the fuzzing failure. I suspect that IHDR can declare a smaller image size than the real image - I'll try to author a regression test that would do that.

@anforowicz anforowicz force-pushed the smaller-memset-in-zlib-stream branch from 59fdff1 to d81d6d7 Compare November 16, 2023 01:22
@anforowicz
Copy link
Contributor Author

Hmmm... I guess I need to look into the fuzzing failure. I suspect that IHDR can declare a smaller image size than the real image - I'll try to author a regression test that would do that.

This is done now. I am not sure if I need to do something special to kick-off the CI tests again?

@anforowicz anforowicz force-pushed the smaller-memset-in-zlib-stream branch from d81d6d7 to ed2a352 Compare November 16, 2023 02:51
@anforowicz anforowicz force-pushed the smaller-memset-in-zlib-stream branch from ed2a352 to fdd9d84 Compare November 27, 2023 15:49
@anforowicz
Copy link
Contributor Author

@fintelia, can you PTAL?

@fintelia
Copy link
Contributor

fintelia commented Dec 11, 2023

Haven't forgotten about this PR. But I've been wanting to finish image-rs/fdeflate#14 first so that we don't have to reserve extra space just to make fdeflate happy

Update: fdeflate:0.3.2 is now published. It removes the requirements for extra buffer space in the output buffer

src/decoder/zlib.rs Outdated Show resolved Hide resolved
@anforowicz anforowicz force-pushed the smaller-memset-in-zlib-stream branch 2 times, most recently from 63c4b67 to 0fb4f46 Compare January 3, 2024 00:05
Copy link
Contributor Author

@anforowicz anforowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fintelia, can you PTAL again?

Thanks for the fdeflate changes. I've tweaked the PR-under-review to simplify things as allowed by the new fdeflate behavior.

FWIW, I've re-measured the impact on the non-compressed 8x8 image and still get 20+% improvement.

src/decoder/zlib.rs Outdated Show resolved Hide resolved
src/decoder/stream.rs Show resolved Hide resolved
src/decoder/zlib.rs Show resolved Hide resolved
src/decoder/zlib.rs Show resolved Hide resolved
write_rgba8_ihdr_with_width(&mut png, 8);

// Here we want to test an invalid image where the `IDAT` chunk contains more data
// (data for 8 image) than declared in the `IHDR` chunk (which only describes an 8x8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "8 image" -> "8x256 image"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops. Done.

@fintelia
Copy link
Contributor

fintelia commented Jan 3, 2024

Left a small comment, but otherwise looks good to me. As a followup I plan to experiment with changing ZlibStream to directly write into a caller provided buffer instead of using its own out_buffer.

Before this commit, `ZlibStream::new` would always allocate and zero out
64kB of data via `out_buffer: vec![0; 2 * CHUNCK_BUFFER_SIZE]`.  After
this commit, `out_buffer` is capped by the actual size of the
decompressed image data.  `StreamingDecoder::parse_ihdr` estimates the
image size and notifies `ZlibStream` via the new `set_max_total_output`
method.

Impact on the runtime of the
`decode/generated-png:noncompressed-8x8.png` benchmark (3 measurements):

* [-24.028% -23.693% -23.460%] (p = 0.00 < 0.05)
* [-23.830% -23.480% -23.200%] (p = 0.00 < 0.05)
* [-21.259% -20.893% -20.618%] (p = 0.00 < 0.05)
@anforowicz anforowicz force-pushed the smaller-memset-in-zlib-stream branch from 0fb4f46 to baf5d26 Compare January 3, 2024 17:08
@anforowicz
Copy link
Contributor Author

Left a small comment, but otherwise looks good to me.

Ack. Thanks for reviewing!

As a followup I plan to experiment with changing ZlibStream to directly write into a caller provided buffer instead of using its own out_buffer.

Thanks for looking into this. Let me try to share a few notes, hoping that it helps (i.e. I am mostly trying to share knowledge and past experience, and definitely not trying to influence the changes one way or another - I don't claim to know which way is "right").

First of all - I note that there are interesting interdependencies between the data:

  • Final image doesn't have the "filter type" byte at the front of each row, but the raw decompressed data includes such byte.
  • Decompressor needs to retain 32kB for looking back.

My early attempts at a similar change can be found in anforowicz@6071020. (Although this is slightly different than what you are considering - here I am avoiding passing &mut Vec<u8> into ZlibStream's decompress and finish_compressed_chunks methods, but I still keep ZlibStream::out_buffer.) I've abandoned this attempt because:

  • I was using this as a prerequisite for an avoiding extra copies into prev row, but this copying has been taken care of with a different, simpler approach - Reduce copying and allocations #422 (thanks!)
  • I was getting mixed performance results (probably because I approached this wrong - I was mixing and measuring multiple related/dependent changes, which made it difficult to understand which changes had a positive/negative impact).

At the same time, removing or replacing the &mut Vec<u8> parameter from ZlibStream's methods can result in some code simplification (and therefore it may be desirable even if it is performance-neutral). For example, there are some callers of ZlibStream::decompress which pass &mut vec![] or Vec::new (read_header_info and finish_decoding in mod.rs).

@fintelia fintelia merged commit c4121b5 into image-rs:master Jan 3, 2024
19 checks passed
@anforowicz anforowicz deleted the smaller-memset-in-zlib-stream branch January 3, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants