Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split data in rectangle chunks for parallelizable processing #33

Closed
WubiCookie opened this issue Nov 27, 2021 · 15 comments
Closed

Split data in rectangle chunks for parallelizable processing #33

WubiCookie opened this issue Nov 27, 2021 · 15 comments

Comments

@WubiCookie
Copy link

The Idea is to add two more fields to the header: chunk width and chunk height (u8?) that defines blocks of data that can be processed independently from others (each have its own "64 last known pixels"). The number of chunks can easily be determined by dividing the image's dimensions.
There are several adventages to this:

  • It will enable parallel processing on the CPU resulting in a much faster decoding and encoding, and allow for a more efficient GPU decoder implementation than what could be done;
  • It will improve pixel locality, as the "64 last known pixels" only work on the width but not on the height.

The drawback is the added complexity, but I think the sacrifice would be worth it. I'm waiting for your feedbacks to start to implement and benchmark this.

Also, each chunk could be stored contiguously for a better data (and not pixel) locality. Maybe better, maybe too complex for this, dunno...

What do you think?

@WubiCookie
Copy link
Author

Possibly related to #29

@rmn20
Copy link

rmn20 commented Nov 27, 2021

I'm going to try to implement this today for testing purposes

@rmn20
Copy link

rmn20 commented Nov 27, 2021

Here's my attempt: github.com/rmn20/qoi

In order to implement parallel processing, we need to clear an array of 64 previously seen pixels, as well as info about the previous pixel, on every chunk.

I have tried several chunks sizes and here are my results:

Image pack Original QOI avg kb 2*2 chunks avg kb 8*8 chunks avg kb 16*16 chunks avg kb
kodak 771 1016 792 757
misc 398 1389 468 405
screenshots 2582 11457 3386 2830
textures 184 287 211 196
wallpaper 10674 17627 11079 10567

(theoretically 8*8 should be the best because the cache size is 8^2, but 16x is the best)
(weird, but my QOI results are worse than the results listed here phoboslab.org/files/qoibench)

I also tried encoding chunks, but without clearing. This would not allow parallel processing, but THEORETICALLY it should improve the compression ratio.

Image pack Original QOI avg kb 2*2 chunks avg kb 4*4 chunks avg kb 8*8 chunks avg kb 16*16 chunks avg kb
kodak 771 785 768 752 745
misc 398 393 394 388 383
screenshots 2582 2627 2631 2604 2565
textures 184 197 194 191 188
wallpaper 10674 10596 10572 10347 10348

Honestly pixel locality doesnt seem to improve compression much... Perhaps an improved QOI_INDEX could help here. (maybe we can store 3 indices in one 16 bit tag?)

@WubiCookie
Copy link
Author

(weird, but my QOI results are worse than the results listed here phoboslab.org/files/qoibench/)

Is it due to the hardwares not being the same?

@rmn20
Copy link

rmn20 commented Nov 27, 2021

I dont think the hardware can affect this.

@rmn20
Copy link

rmn20 commented Nov 27, 2021

I think this is related to 30f8a39

@rmn20
Copy link

rmn20 commented Nov 27, 2021

I tried to change the order of pixels in chunks and combine chunks encoding with #38, and I think I got some interesting results.

@rmn20
Copy link

rmn20 commented Nov 27, 2021

So I changed pixel order in chunks from this

0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15

to this

0 1 2 3
7 6 5 4
8 9 10 11
15 14 13 12

And here are my results:
(I'm not clearing data between chunks in this test)

Image pack Original QOI avg kb 16*16 chunks avg kb kmar avg kb 16*16 chunks combined with kmar work avg kb 16*16 chunks with new pixel order combined with kmar work avg kb
kodak 771 745 720 685 675
misc 398 383 365 347 330
screenshots 2582 2565 2493 2469 2326
textures 184 188 176 179 176
wallpaper 10674 10348 10225 9826 9579

I'm thinking at the moment, how can I show this combined code on github? I mean, I can't create a fork from a fork :P

@kmar
Copy link

kmar commented Nov 27, 2021

feel free to adopt my changes to your code, your results are very interesting (and I'm a bit sad that the 7% on kodim drop to half on other datasets) - maybe you could even try some fancier ordering, like jpeg zigzag, but I think it won't work well as this is not DCT

@oscardssmith
Copy link

the obvious things to try would be z curves or Hilbert curves.

@rmn20
Copy link

rmn20 commented Nov 27, 2021

Updated the code on my fork

@kmar
Copy link

kmar commented Nov 27, 2021

Updated the code on my fork

amazing, this adds 7% more on top of what I had (kodim) - also when deflated now seems to beat png. pretty neat!

@kmar
Copy link

kmar commented Nov 27, 2021

qoi-style encoding still has problems with images like this one: my scheme somewhat improves cases like that (alpha mode), but still far from optimal (compared to png).

I wonder if completely splitting alpha encoding (somehow) might help here, like qoi-style encoding with multiple pixels at once or something like that

by the way - this image breaks when encoded with your chunk compressor, probably the mode switch command isn't handled properly or something like that? I see now, line 716 in the decoder, on mode change, there is

px_pos -= channels;
but that's incorrect in the chunked version, a proper way is to do --x instead, because mode switch doesn't output a pixel

alt text

@rmn20
Copy link

rmn20 commented Nov 28, 2021

Yes, looks like I missed this moment

@phoboslab
Copy link
Owner

The file format for QOI will not change anymore. See #37 for more info.

Ideas for a successor to QOI should be discussed here: https://github.com/nigeltao/qoi2-bikeshed/issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants