Split data in rectangle chunks for parallelizable processing #33

WubiCookie · 2021-11-27T00:20:19Z

The Idea is to add two more fields to the header: chunk width and chunk height (u8?) that defines blocks of data that can be processed independently from others (each have its own "64 last known pixels"). The number of chunks can easily be determined by dividing the image's dimensions.
There are several adventages to this:

It will enable parallel processing on the CPU resulting in a much faster decoding and encoding, and allow for a more efficient GPU decoder implementation than what could be done;
It will improve pixel locality, as the "64 last known pixels" only work on the width but not on the height.

The drawback is the added complexity, but I think the sacrifice would be worth it. I'm waiting for your feedbacks to start to implement and benchmark this.

Also, each chunk could be stored contiguously for a better data (and not pixel) locality. Maybe better, maybe too complex for this, dunno...

What do you think?

WubiCookie · 2021-11-27T00:21:46Z

Possibly related to #29

rmn20 · 2021-11-27T12:25:33Z

I'm going to try to implement this today for testing purposes

rmn20 · 2021-11-27T14:55:00Z

Here's my attempt: github.com/rmn20/qoi

In order to implement parallel processing, we need to clear an array of 64 previously seen pixels, as well as info about the previous pixel, on every chunk.

I have tried several chunks sizes and here are my results:

Image pack	Original QOI avg kb	2*2 chunks avg kb	8*8 chunks avg kb	16*16 chunks avg kb
kodak	771	1016	792	757
misc	398	1389	468	405
screenshots	2582	11457	3386	2830
textures	184	287	211	196
wallpaper	10674	17627	11079	10567

(theoretically 8*8 should be the best because the cache size is 8^2, but 16x is the best)
(weird, but my QOI results are worse than the results listed here phoboslab.org/files/qoibench)

I also tried encoding chunks, but without clearing. This would not allow parallel processing, but THEORETICALLY it should improve the compression ratio.

Image pack	Original QOI avg kb	2*2 chunks avg kb	4*4 chunks avg kb	8*8 chunks avg kb	16*16 chunks avg kb
kodak	771	785	768	752	745
misc	398	393	394	388	383
screenshots	2582	2627	2631	2604	2565
textures	184	197	194	191	188
wallpaper	10674	10596	10572	10347	10348

Honestly pixel locality doesnt seem to improve compression much... Perhaps an improved QOI_INDEX could help here. (maybe we can store 3 indices in one 16 bit tag?)

WubiCookie · 2021-11-27T15:02:58Z

(weird, but my QOI results are worse than the results listed here phoboslab.org/files/qoibench/)

Is it due to the hardwares not being the same?

rmn20 · 2021-11-27T15:05:29Z

I dont think the hardware can affect this.

rmn20 · 2021-11-27T18:15:00Z

I think this is related to 30f8a39

rmn20 · 2021-11-27T19:46:57Z

I tried to change the order of pixels in chunks and combine chunks encoding with #38, and I think I got some interesting results.

rmn20 · 2021-11-27T20:19:18Z

So I changed pixel order in chunks from this

0	1	2	3
4	5	6	7
8	9	10	11
12	13	14	15

to this

0	1	2	3
7	6	5	4
8	9	10	11
15	14	13	12

And here are my results:
(I'm not clearing data between chunks in this test)

Image pack	Original QOI avg kb	16*16 chunks avg kb	kmar avg kb	16*16 chunks combined with kmar work avg kb	16*16 chunks with new pixel order combined with kmar work avg kb
kodak	771	745	720	685	675
misc	398	383	365	347	330
screenshots	2582	2565	2493	2469	2326
textures	184	188	176	179	176
wallpaper	10674	10348	10225	9826	9579

I'm thinking at the moment, how can I show this combined code on github? I mean, I can't create a fork from a fork :P

kmar · 2021-11-27T20:27:12Z

feel free to adopt my changes to your code, your results are very interesting (and I'm a bit sad that the 7% on kodim drop to half on other datasets) - maybe you could even try some fancier ordering, like jpeg zigzag, but I think it won't work well as this is not DCT

oscardssmith · 2021-11-27T22:40:37Z

the obvious things to try would be z curves or Hilbert curves.

rmn20 · 2021-11-27T22:58:09Z

Updated the code on my fork

kmar · 2021-11-27T23:27:31Z

Updated the code on my fork

amazing, this adds 7% more on top of what I had (kodim) - also when deflated now seems to beat png. pretty neat!

kmar · 2021-11-27T23:42:03Z

qoi-style encoding still has problems with images like this one: my scheme somewhat improves cases like that (alpha mode), but still far from optimal (compared to png).

I wonder if completely splitting alpha encoding (somehow) might help here, like qoi-style encoding with multiple pixels at once or something like that

by the way - this image breaks when encoded with your chunk compressor, probably the mode switch command isn't handled properly or something like that? I see now, line 716 in the decoder, on mode change, there is

px_pos -= channels;
but that's incorrect in the chunked version, a proper way is to do --x instead, because mode switch doesn't output a pixel

rmn20 · 2021-11-28T00:34:39Z

Yes, looks like I missed this moment

phoboslab · 2021-11-28T16:48:59Z

The file format for QOI will not change anymore. See #37 for more info.

Ideas for a successor to QOI should be discussed here: https://github.com/nigeltao/qoi2-bikeshed/issues

nigeltao mentioned this issue Nov 27, 2021

Tiling (allowing parallel processing) nigeltao/qoi2-bikeshed#9

Open

kmar mentioned this issue Nov 28, 2021

change encoding format, ~7% smaller files on kodim test suite #41

Closed

phoboslab closed this as completed Nov 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split data in rectangle chunks for parallelizable processing #33

Split data in rectangle chunks for parallelizable processing #33

WubiCookie commented Nov 27, 2021

WubiCookie commented Nov 27, 2021

rmn20 commented Nov 27, 2021 •

edited

Loading

rmn20 commented Nov 27, 2021 •

edited

Loading

WubiCookie commented Nov 27, 2021

rmn20 commented Nov 27, 2021

rmn20 commented Nov 27, 2021

rmn20 commented Nov 27, 2021

rmn20 commented Nov 27, 2021

kmar commented Nov 27, 2021 •

edited

Loading

oscardssmith commented Nov 27, 2021

rmn20 commented Nov 27, 2021

kmar commented Nov 27, 2021

kmar commented Nov 27, 2021 •

edited

Loading

rmn20 commented Nov 28, 2021

phoboslab commented Nov 28, 2021

Split data in rectangle chunks for parallelizable processing #33

Split data in rectangle chunks for parallelizable processing #33

Comments

WubiCookie commented Nov 27, 2021

WubiCookie commented Nov 27, 2021

rmn20 commented Nov 27, 2021 • edited Loading

rmn20 commented Nov 27, 2021 • edited Loading

WubiCookie commented Nov 27, 2021

rmn20 commented Nov 27, 2021

rmn20 commented Nov 27, 2021

rmn20 commented Nov 27, 2021

rmn20 commented Nov 27, 2021

kmar commented Nov 27, 2021 • edited Loading

oscardssmith commented Nov 27, 2021

rmn20 commented Nov 27, 2021

kmar commented Nov 27, 2021

kmar commented Nov 27, 2021 • edited Loading

rmn20 commented Nov 28, 2021

phoboslab commented Nov 28, 2021

rmn20 commented Nov 27, 2021 •

edited

Loading

rmn20 commented Nov 27, 2021 •

edited

Loading

kmar commented Nov 27, 2021 •

edited

Loading

kmar commented Nov 27, 2021 •

edited

Loading