Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Feature Request: Multithreading (rayon) #3

Closed
owenthewizard opened this issue Jun 17, 2022 · 6 comments · Fixed by #6
Closed

Feature Request: Multithreading (rayon) #3

owenthewizard opened this issue Jun 17, 2022 · 6 comments · Fixed by #6
Assignees
Labels
enhancement New feature or request

Comments

@owenthewizard
Copy link
Contributor

owenthewizard commented Jun 17, 2022

It's great to see a correct stackblur implementation in Rust! I switched out the algorithm I use in i3lockr with yours and it takes about ~100 ms whereas the multithreaded C version of stackblur I was using takes ~30 ms. However, the C version is naive and clobbers the gamma. The 100 ms is including sRGB to linear round trip. I expect with multithreading yours may be very close to 30 ms, if not faster.

@LoganDark
Copy link
Owner

LoganDark commented Jun 17, 2022

It's currently not easy to implement multithreading efficiently due to limitations in rayon. I have been looking into it though.

On the topic of optimization:

It's possible to perform more exotic optimizations like using SIMD to process 4 or 8 rows/columns at once per thread. I was also told that SIMD can make it possible to "unroll", or process more than one item per iteration (and that my implementation is "poorly written" for not doing so), but that's far outside my skillset.

There is an experimental generator branch that, without multithreading, sees what would happen if you blurred the entire image in one single pass rather than running a horizontal and then vertical blur. It's not pretty - the state machine manages to be about 2x slower than the current implementation, which is extremely surprising. And the blur function which does both axes at once is like 30% slower than running a horizontal and then vertical blur, even with the generator. So that was a funny failed experiment.

I do really like the generator API though (it's much more flexible than wrapping an iterator) and I'm sad that it can't reach an acceptable level of performance. Maybe I'll have to wait until Rust supports true coroutines. It's going to rot, because I'm not going to merge it in (it's too slow).

For the non-state-machine branch, it looks like LLVM is somehow optimizing the horizontal blur in a special way (it likely does this for your vertical blur as well, but not mine, which is how yours manages to be faster). Without true coroutines I can't seem to get the non-generator version to optimize correctly. Someone is probably gonna have to look into this and see if it's possible.

I have also tried multiple times (unsuccessfully) to reduce the radius * 2 + 2 scratch space requirement back down to radius * 2 + 1. I'm really not sure what's going on there but I'm starting to think my particular implementation just isn't friendly to it.

@LoganDark LoganDark self-assigned this Jun 17, 2022
@LoganDark LoganDark added the enhancement New feature or request label Jun 17, 2022
@LoganDark
Copy link
Owner

I now have the iterators I need for this (imgref-iter), so it's just waiting on rayon.

@LoganDark
Copy link
Owner

jk lol 7996893 try that

@owenthewizard
Copy link
Contributor Author

Wow, very impressive!
stackblur-iter now matches the C stackblur I was previously using in i3lockr for small radii, and easily beats it on larger radii.

@LoganDark
Copy link
Owner

Wow, very impressive! stackblur-iter now matches the C stackblur I was previously using in i3lockr for small radii, and easily beats it on larger radii.

Nice! :)

Glad to hear that it's starting to perform well. I'll open a PR for the rayon branch for now.

@LoganDark LoganDark mentioned this issue Jun 18, 2022
@LoganDark
Copy link
Owner

@owenthewizard The new implementation of the rayon branch that I just pushed (based on the latest master) is faster than the old one. This may beat your C++ implementation for small radii.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants