Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fastest way to convert to vec/iterate elements #1339

Closed
RunDevelopment opened this issue Dec 8, 2023 · 10 comments
Closed

Fastest way to convert to vec/iterate elements #1339

RunDevelopment opened this issue Dec 8, 2023 · 10 comments

Comments

@RunDevelopment
Copy link

Hi! I wanted to know what's the fastest way to convert a PyReadonlyArrayDyn<f32> into a Vec<f32>.

I'm making python bindings for an image processing programs, and converting numpy arrays into a contiguous memory format is the first step. I'm already using ndarray.as_slice() which allows me to skip the conversion to vec. This issue is about indexed ndarrays. I'm currently using the following code for the conversion:

let flat: Vec<f32> = ndarray.as_array().iter().cloned().collect();

Compared to a slice.to_vec(), this is more than 10x slower. Running this code on a (4320, 8468, 4) ndarray takes 0.2sec if the array is a slice (so it runs slice.to_vec()) and 3sec when the array is indexed (I cropped off 1px on the left side).

This is a huge performance cliff for me, so I would like to know what I can do to fix this.

@nilgoyette
Copy link
Collaborator

By "indexed", do you mean "not the complete array, just a portion of it"?

In general, iter is slow in ndarray. You might want to test to_owned then to_vec and tell us about the performance. In all cases, 3s seems really high!

@RunDevelopment
Copy link
Author

By "indexed", do you mean "not the complete array, just a portion of it"?

Sorry, I meant a sliced array. E.g. the cropped image array from my "benchmark" was created in python like this: img[:, 1:, ...].

You might want to test to_owned then to_vec and tell us about the performance.

Could you please explain this in more detail? Changing my above code to:

let flat: Vec<f32> = ndarray.to_owned().to_vec().unwrap();

Gives me a NotContiguousError at runtime.

So I tried using the to_vec method from ArrayBase. I also had to reshape the array since I'm working with 2D/3D images and to_vec is only available for 1D arrays. The following code did work correctly:

let flat: Vec<f32> = ndarray
    .as_array()
    .to_owned()
    .into_shape(ndarray.shape().iter().product::<usize>())
    .unwrap()
    .to_vec();

This clocked in at 1.1sec, which is still 6x slower than slice.to_vec(). The break-down is that to_owned takes 0.95sec and to_vec takes ~0.15sec. So copying from contiguous memory into a vec is as fast as ever, but to_owned is very slow.

@RunDevelopment
Copy link
Author

RunDevelopment commented Dec 8, 2023

I did some further testing and the main reason to_owned is so slow seems to be that my array uses IxDyn. Using .into_dimensionality() to "cast" the array to use Ix3 speeds up to_owned from 0.95sec to 0.25sec (4x faster). Doing this "cast" also sped up my old .iter().copied().collect() to take 0.4sec, so iteration is 8-9x faster.

So arrayBase.iter() should probably do something similar as an internal optimization.

I also found the .into_raw_vec() method which gets rid of the copying .to_vec() does. With the into_dimensionality() optimization, this brings it down to 0.25sec total, which seems to be the best that is possible right now.

@adamreichold
Copy link
Collaborator

If you do assume a dimensionality of 3, you could also change your Python API to expect PyReadonlyArray3<f32> instead.

@RunDevelopment
Copy link
Author

If you do assume a dimensionality of 3

I don't. I just simplified my findings for the sake of brevity. In my code, I match on array.shape().len() to support 2D and 3D arrays (we use 2D arrays for grayscale images in the software I work with).

@adamreichold
Copy link
Collaborator

You might still be better of accepting an enum of PyReadonlyArray2<f32> or PyReadonlyArray3<f32>, c.f. https://github.com/PyO3/rust-numpy/blob/main/examples/simple/src/lib.rs#L113

@RunDevelopment
Copy link
Author

Since I figured out the currently fastest way to get a vec, I consider this issue closed now.

Should I make a new issue for the IxDyn performance problem?

@adamreichold
Copy link
Collaborator

Should I make a new issue for the IxDyn performance problem?

Personally, I am not sure I consider it a problem as the iterator is most likely not the only place where IxDyn is measurably faster than Ix? and I would like to avoid trying to cushion all of those places out with small-dimensionality fast paths. I think the main learning is to use the fixed-dimensionality types if that is at all possible.

@nilgoyette
Copy link
Collaborator

Should I make a new issue for the IxDyn performance problem?

If you do, please explain clearly and with at least one code example.

@RunDevelopment
Copy link
Author

I think the main learning is to use the fixed-dimensionality types if that is at all possible.

If you want to teach people how to use ndarrays most efficiently, I would suggest writing documentation or changing the API to make IxDyn less convenient to use. Right now, IxDyn is the closest thing to how ndarrays behave in python, so I would argue that people will naturally gravitate towards it. I certainly did.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants