Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image resizing feature #149

Closed
kvark opened this issue Nov 18, 2023 · 4 comments
Closed

Image resizing feature #149

kvark opened this issue Nov 18, 2023 · 4 comments

Comments

@kvark
Copy link

kvark commented Nov 18, 2023

Currently unimplemented in bilinear_impl

@etemesi254
Copy link
Owner

Hi I'd recommend to use fast_image_resize as I work on that. But even my implementation won't do what you want to , it is only meant for single channel images(grayscale) since zune-image de-interleaves images into separate color channels, you seem to be wanting to use it for interleaved pixels, that won't work

@Shnatsel
Copy link
Contributor

Shnatsel commented Jan 2, 2024

FYI the image crate recently landed an optimized bilinear scaling algorithm that works on de-interleaved pixels internally: image-rs/image#2078

It relies on autovectorization to achieve SIMD acceleration. Probably not as fast on large images as anything using AVX2 explicitly though.

@etemesi254
Copy link
Owner

FYI the image crate recently landed an optimized bilinear scaling algorithm that works on de-interleaved pixels internally: image-rs/image#2078

Interesting function, but it just tells you what pixel is supposed to be at the position x,y and not what is actually at that position, so it ends up doing a lot of redundant work when called within an image resize, the best way to do it would be a for loop and hoist calculations that don't change per iteration, e.g when resizing you wouldn't repeat the checks that return None in the inner loop and you can remove the y part.

A more performant bilinear resize would look like this https://godbolt.org/z/rYxf6qTh1, but again only works on one channel, if you are on four channels and your output/input is floats it's way better since you can actually vectorize it.

Which leads me to

It relies on autovectorization to achieve SIMD acceleration. Probably not as fast on large images as anything using AVX2 explicitly though.

Sadly, there is no autovectorization happening, floating point calculations in x86 use vector registers by default with the suffix telling us whether it's single precision( a single calculation at once) or packed precision(multiple calculations at once), ss means single precision, ps means packed single precision see mulss and mulps.

So if you look closely below, we have mulss which usually mean multiply single precision, hence multiply one floating point number with the other

image

and if you count how many times it is repeated, it's unrolled 4x times, each one for a single iteration. The one reason I can think of the compiler not autovectorizing this may be because of floating point associativity see What Every Computer Scientist Should Know About Floating-Point Arithmetic

I couldn't nudge the compiler to produce vectorized code no matter how hard I try, so had to explicitly use SIMD, so here is an actual bilinear resize for f32 floats which uses SIMD https://godbolt.org/z/vzPEzGTs8 (using portable SIMD),

image

notice how we have now ps suffixes and a shorter output? means we finally got what we expected when we said SIMD.

@etemesi254
Copy link
Owner

Now present in zune-imageprocs as Resize, hence I think this can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants