Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid 32kB decompression lag + compact less often. (#447)
Avoiding 32kB decompression lag =============================== Before this commit, decompressed data would be accumulated in `ZlibStream::out_buffer` and returned via `image_data` with 32kB lag corresponding to `CHUNCK_BUFFER_SIZE`: ``` fn transfer_finished_data(&mut self, image_data: &mut Vec<u8>) -> usize { let safe = self.out_pos.saturating_sub(CHUNCK_BUFFER_SIZE); image_data.extend(self.out_buffer.drain(..safe)); ... ``` 32kB is a typical size of L1 cache, so the lag would mean that the data passed to `image_data.extend(...)` would be already cold and evicted from the L1 cache. This commit avoids the lag by always returning into `image_data` all the data from `out_buffer` (i.e. data up to `out_pos`): ``` fn transfer_finished_data(&mut self, image_data: &mut Vec<u8>) -> usize { let transferred = &self.out_buffer[self.read_pos..self.out_pos]; image_data.extend_from_slice(transferred); self.read_pos = self.out_pos; ... ``` Compacting less often ===================== The changes above mean that `Vec::drain` no longer compacts `out_buffer`. Therefore this commit also refactors how this compaction works. Before this commit, not-yet-returned data would be shifted to the beginning of `out_buffer` every time `transfer_finished_data` is called. This could potentially mean that for 1 returned byte, N bytes have to be copied during compaction. After this commit, compaction is only done when the compaction cost if offset by many read bytes - for 3 returned bytes 1 byte has to be copied during compaction. Performance impact ================== The commit has a positive impact on performance, except for: * `decode/Transparency.png` - regression between 15% and 20% is reported in 3-out-of-3 measurements. * `decode/kodim17.png` - a regression of 2.1% has been reported in 1-out-of-3 measurements (an improvement of 0.6% - 1.13% has been reported in the other 2-out-of-3 measurements). * `generated-noncompressed-64k-idat/128x128.png` - a regression of 25% has been reported in 1-out-of-3 measurements (an improvement of 21% - 29% has been reported in the other 2-out-of-3 measurements). The results below have been gathered by running the `decoder` benchmark. First a baseline was saved before this commit, and then a comparison was done after the commit. This (the baseline + the comparison) was repeated a total of 3 times. All results below are for the relative impact on the runtime. All results are with p = 0.00 < 0.05. * decode/kodim23.png: * [-2.9560% -2.7112% -2.4009%] * [-3.4876% -3.3406% -3.1928%] * [-3.0559% -2.9208% -2.7787%] * decode/kodim07.png: * [-1.2527% -1.0110% -0.6780%] * [-1.7851% -1.6558% -1.5164%] * [-1.6576% -1.5216% -1.3856%] * decode/kodim02.png: * [-0.5108% -0.2806% -0.0112%] * [-1.0885% -0.9493% -0.8118%] * [-0.5563% -0.4239% -0.2874%] * decode/kodim17.png: * [+1.8649% +2.1138% +2.4169%] (**regression**) * [-1.2891% -1.1322% -0.9736%] * [-0.7753% -0.6276% -0.4866%] * decode/Lohengrin_-_Illustrated_Sporting_and_Dramatic_News.png: * [-1.7165% -1.4968% -1.2650%] * [-1.7051% -1.4473% -1.2229%] * [-1.2544% -1.0457% -0.8375%] * decode/Transparency.png: * [+19.329% +19.789% +20.199%] (**regression**) * [+15.337% +15.798% +16.294%] (**regression**) * [+18.694% +19.106% +19.518%] (**regression**) * generated-noncompressed-4k-idat/8x8.png: * [-2.3295% -1.9940% -1.5912%] * [-6.1285% -5.8872% -5.6091%] * [-2.8814% -2.6787% -2.4820%] * generated-noncompressed-4k-idat/128x128.png: * [-59.793% -59.599% -59.417%] * [-63.930% -63.846% -63.756%] * [-62.377% -62.248% -62.104%] * generated-noncompressed-4k-idat/2048x2048.png: * [-67.678% -67.579% -67.480%] * [-65.616% -65.519% -65.429%] * [-65.824% -65.647% -65.413%] * generated-noncompressed-4k-idat/12288x12288.png: * [-60.932% -60.774% -60.528%] * [-62.088% -62.016% -61.940%] * [-61.663% -61.604% -61.546%] * generated-noncompressed-64k-idat/128x128.png: * [-22.237% -21.975% -21.701%] * [-29.656% -29.480% -29.311%] * [+24.812% +25.190% +25.571%] (**regression**) * generated-noncompressed-64k-idat/2048x2048.png: * [-21.826% -21.499% -21.087%] * [-54.279% -54.049% -53.715%] * [-11.174% -10.828% -10.482%] * generated-noncompressed-64k-idat/12288x12288.png: * [-40.421% -40.311% -40.180%] * [-39.496% -39.183% -38.871%] * [-41.443% -41.367% -41.295%] * generated-noncompressed-2g-idat/2048x2048.png: * [-40.136% -40.010% -39.865%] * [-58.507% -58.333% -58.060%] * [-35.822% -35.457% -35.038%] * generated-noncompressed-2g-idat/12288x12288.png: * [-37.196% -37.107% -37.014%] * [-36.125% -36.049% -35.970%] * [-35.636% -35.477% -35.350%] Co-authored-by: Lukasz Anforowicz <lukasza@marcin-mx>
- Loading branch information