Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing points via parallel arrays #526

Closed
nyanpasu64 opened this issue Nov 1, 2018 · 7 comments
Closed

Indexing points via parallel arrays #526

nyanpasu64 opened this issue Nov 1, 2018 · 7 comments
Labels

Comments

@nyanpasu64
Copy link

Numpy python:

In [1]: import numpy as np

In [2]: x = np.zeros((10,10))

In [3]: x[[0,0,1,1], [0,1,2,3]] = 1

equivalently:
    points = ([0,0,1,1], [0,1,2,3])
    x[points] = 1

In [4]: x
Out[4]: 
array([[1., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
truncated...

Can I accomplish this in Rust ndarray? Should I loop over x and y indices instead? Will I lose performance using a loop?

Should I instead generate x and y indices in a loop, and assign to 1 array point at a time?

@jturner314
Copy link
Member

jturner314 commented Nov 1, 2018

ndarray doesn't support that particular type of "advanced indexing". You can use a loop instead, as you suggested:

extern crate ndarray;

use ndarray::prelude::*;

fn main() {
    let mut x = Array::zeros((10, 10));
    for (&i, &j) in [0, 0, 1, 1].iter().zip(&[0, 1, 2, 3]) {
        x[(i, j)] = 1.;
    }
    println!("{}", x);
}

Will I lose performance using a loop?

In principle, a loop shouldn't be worse than what NumPy has to do for "advanced indexing" like that. When using Python/NumPy, it's typically very advantageous to avoid Python loops and instead try to make NumPy do as much of the work as possible, since Python is slow but NumPy is largely written in C. In contrast, when using Rust/ndarray, both the user's code and ndarray are Rust, so trying to put all the work on ndarray is much less important.

It is worth noting, though, that indexing individual elements will generally be slower than using an iterator or a method like .map() or .fold(), but that's just because calculating the address of an element from an index is more expensive than moving a pointer in an iterator. [Edit: Indexing also performs bounds-checking, which iterators can generally avoid.] So, use an iterator or related method when possible, but indexing is an option otherwise.

@nyanpasu64
Copy link
Author

questions:

let dxs: Array or similar;
let dys: Array or similar;
Zip::from(&(dxs + xx[0])).and(&(dys + yy[0])).apply(|x, y| {}

What's the difference between Zip::from and .zip()?

Is this how I'm supposed to iterate over 2 arrays? It seems a bit clunky. And if I omit a single ampersand, omit the parens, or stack on extra & for the lulz, I get some weird obscure trait-related error that doesn't tell me to pass a reference:

error[E0277]: the trait bound `ndarray::ArrayBase<ndarray::OwnedRepr<isize>, ndarray::Dim<[usize; 1]>>: ndarray::NdProducer` is not satisfied
  --> src/main.rs:87:13
   |
87 |             Zip::from((dxs + xx[0])).and(&(dys + yy[0])).apply(|x, y| {
   |             ^^^^^^^^^ the trait `ndarray::NdProducer` is not implemented for `ndarray::ArrayBase<ndarray::OwnedRepr<isize>, ndarray::Dim<[usize; 1]>>`
   |
   = help: the following implementations were found:
             <ndarray::ArrayBase<ndarray::ViewRepr<&'a A>, D> as ndarray::NdProducer>
             <ndarray::ArrayBase<ndarray::ViewRepr<&'a mut A>, D> as ndarray::NdProducer>
   = note: required by `<ndarray::Zip<(P,), D>>::from`

It took me over 10 minutes to find out what went wrong, and to add & and parens.

This looks nearly as impenetrable as C++ template errors. Is it normal for Rust to have confusing errors? Does this error clearly tell an experienced Rust user to pass a reference instead?


image
    .subview_mut(Axis(0), *x as usize)
    .subview_mut(Axis(0), *y as usize)
    .fill(0xff);

This syntax seems quite ugly. Is there any better way to replicate Numpy's arr[x,y] or arr[x,y,:] = scalar or RGB ndarray?

Is looping i over image[*x][*y][i] = color[i] slower due to bounds checking?

@jturner314
Copy link
Member

jturner314 commented Nov 1, 2018

What's the difference between Zip::from and .zip()?

arr1.iter().zip(arr2.iter()) creates an iterator over each array and zips the iterators together. Zip operates on arrays through the NdProducer trait. Using Zip has at least three advantages:

  1. It makes sure that the shapes of the producers match. In contrast, using iterators, I might accidentally do something like this:

    extern crate ndarray;
    
    use ndarray::prelude::*;
    
    fn main() {
        let arr1 = Array2::<f64>::zeros((2, 3));
        let arr2 = Array2::<f64>::zeros((3, 2));
        for (a, b) in arr1.iter().zip(arr2.iter()) {
            // do something with a and b
        }
    }

    when I meant to transpose one of the arrays when zipping them together in order to match the elements in a sensible way.

  2. Iterators always iterate over arrays in logical (row-major) order, which is slow for arrays that have non-row-major memory layout. In contrast, Zip decides an efficient iteration order based on the memory layout of the producers.

  3. Zip provides support for efficiently operating over producers in parallel. (See the ndarray-parallel crate.)

See also this article introducing NdProducer and Zip.

Is this how I'm supposed to iterate over 2 arrays?

What are xx[0] and yy[0]? If they're scalars, I'd recommend adding them within the closure, like this:

#[macro_use(azip)]
extern crate ndarray;

azip!(dxs, dys in {
    let x = dxs + xx[0];
    let y = dxs + yy[0];
    // do something
});

This avoids creating temporary arrrays for the results of dxs + xx[0] and dys + yy[0].

If they're arrays, I'd recommend zipping over them too (broadcasting if necessary).

And if I omit a single ampersand, omit the parens, or stack on extra & for the lulz, I get some weird obscure trait-related error that doesn't tell me to pass a reference:

error[E0277]: the trait bound `ndarray::ArrayBase<ndarray::OwnedRepr<isize>, ndarray::Dim<[usize; 1]>>: ndarray::NdProducer` is not satisfied
  --> src/main.rs:87:13
   |
87 |             Zip::from((dxs + xx[0])).and(&(dys + yy[0])).apply(|x, y| {
   |             ^^^^^^^^^ the trait `ndarray::NdProducer` is not implemented for `ndarray::ArrayBase<ndarray::OwnedRepr<isize>, ndarray::Dim<[usize; 1]>>`
   |
   = help: the following implementations were found:
             <ndarray::ArrayBase<ndarray::ViewRepr<&'a A>, D> as ndarray::NdProducer>
             <ndarray::ArrayBase<ndarray::ViewRepr<&'a mut A>, D> as ndarray::NdProducer>
   = note: required by `<ndarray::Zip<(P,), D>>::from`

Is it normal for Rust to have confusing errors? Does this error clearly tell an experienced Rust user to pass a reference instead?

Rust is known for having good error messages. The compiler may not always be able to suggest exactly what change you need to make (because it doesn't necessary know your intent), but it generally does a good job indicating what the problem is. Let's break down this error message.

  • The first line tells us what the problem is: the trait bound `ndarray::ArrayBase<ndarray::OwnedRepr<isize>, ndarray::Dim<[usize; 1]>>: ndarray::NdProducer` is not satisfied. Using the type aliases provided by ndarray, this is equivalent to the trait bound `ndarray::Array1<isize>: ndarray::NdProducer` is not satisfied.

    This means that we're trying to use the NdProducer trait with an owned 1-D array with isize elements, but that type doesn't implement the NdProducer trait.

  • Next, the message shows the exact place where the problem is occurring (the Zip::from call), and it writes the error in another way: the trait `ndarray::NdProducer` is not implemented for `ndarray::ArrayBase<ndarray::OwnedRepr<isize>, ndarray::Dim<[usize; 1]>>` . Using the type aliases provided by ndarray, this is equivalent to the trait `ndarray::NdProducer` is not implemented for `ndarray::Array1<isize>` .

    This tells us that the argument we're passing to Zip::from has type Array1<isize>, but it doesn't work because Array1<isize> doesn't implement the NdProducer trait. (This is further reinforced by the note: required by `<ndarray::Zip<(P,), D>>::from` .)

  • Finally, the error message suggests that there are implementations of the NdProducer trait for <ndarray::ArrayBase<ndarray::ViewRepr<&'a A>, D> as ndarray::NdProducer> and <ndarray::ArrayBase<ndarray::ViewRepr<&'a mut A>, D> as ndarray::NdProducer>. More concisely, these are ArrayView<'a, A, D> and ArrayViewMut<'a, A, D>.

    This is trying to be helpful by suggesting that we may have intended to pass an ArrayView or ArrayViewMut instead of an Array.

So, the compiler identifies where the problem is occurring and what it is, and it even provides suggestions of types that work. Given this message, we know that we need to somehow transform (dxs + xx[0]) into a type that works.

In general, I would suggest first going to the docs for the function where the error is occurring (Zip::from) to get a better understanding of the requirements of the function. This shows that the parameter needs to implement IntoNdProducer. Clicking on IntoNdProducer jumps to its docs, which include a list of the types that implement it. We see that IntoNdProducer is implemented for &'a ArrayBase<S, D>, so we can try adding an ampersand so that we pass a reference to the array instead of the array itself. This works.

Alternatively, the compiler told us that it expected a type that implements NdProducer, and it told us that ArrayView does. So, we could try converting the array to a view with (dxs + xx[0]).view(), and that does in-fact work. We can see a list of the implementors of NdProducer in the docs if neither of the compiler's suggestions are what we want.

[Edit: The error message is somewhat misleading in that it indicates we need a type that implements NdProducer instead of one that just implements IntoNdProducer. I've submitted a bug report for this (rust-lang/rust#55591). Regardless, we can fix the problem if we provide a type that implements NdProducer (as the compiler indicated) or if we provide a type that implements IntoNdProducer (as indicated by the docs).]

The most confusing part is the long type names, which is a result of ndarray using a single ArrayBase type that's generic over ownership, element type, and dimension. ndarray provides type aliases to make using it more ergonomic, but the full types do show up in compiler error messages. See the list of type aliases and click on each of them to get an understanding of what they're aliases for.

Fwiw, the equivalent error in Python would be something like

>>> Zip(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in Zip
AttributeError: 'list' object has no attribute 'foo'

and it would occur at runtime (maybe after the program has already been running for hours), not compile time. This error message gives very little information on how to fix the problem, and it's especially problematic if the AttributeError is initially raised deep inside one of the dependencies. In my experience, these types of errors can be very difficult to debug because everything is duck-typed and library docs aren't always good about describing what argument types and values are acceptable.

image
    .subview_mut(Axis(0), *x as usize)
    .subview_mut(Axis(0), *y as usize)
    .fill(0xff);

This syntax seems quite ugly. Is there any better way to replicate Numpy's arr[x,y] or arr[x,y,:] = scalar or RGB ndarray?

See Slicing and the s![] macro. Assuming image is a 3-D array, this example can be written as

#[macro_use(s)]
extern crate ndarray;

image.slice_mut(s![*x, *y, ..]).fill(0xff);

Is looping i over image[*x][*y][i] = color[i] slower due to bounds checking?

Yes; avoid indexing where possible.

@nyanpasu64
Copy link
Author

nyanpasu64 commented Nov 1, 2018

Thanks for the help. I'm not sure if I'll continue learning Rust, seems I could learn Nim and https://github.com/mratsim/Arraymancer, it could be faster to work with and have less ownership quirks, while still building to a self-contained binary, and being fast at runtime.

For anything other than Python and numerically-focused languages, I may have to reimplement a lot of DSP-related things, but it should be fine since I can just reference the Scipy source code.

Is there a reason you didn't instead use more concise syntax like image.slice{_mut}!(*x, *y, ..) and image.slice{_mut}![*x, *y, ..] or image.s![0, .., ..]? (unsure which are valid syntax)

@jturner314
Copy link
Member

I'm not sure if I'll continue learning Rust, seems I could learn Nim and https://github.com/mratsim/Arraymancer, it could be faster to work with and have less ownership quirks, while still building to a self-contained binary, and being fast at runtime.

I don't know anything about Nim; it looks interesting. You might also be interested in Julia. I haven't used it myself but have heard good things about it.

In favor of Rust, I will say that the powerful type system and borrow checker really help me to write correct code, which is important to me for scientific research because I don't like having to fix something and restart my program after it's been running for a long time doing calculations. It also makes me more confident that my results are correct. Before Rust, I used Python for many years, and I found that large programs became difficult to manage due to the highly dynamic nature of the language. Learning Rust did take some investment (reading the book and working with the language for a while), but I've been really happy with the language once I became comfortable with it. The primary disadvantage right now is the lack of full-featured, established scientific libraries like NumPy/SciPy/Pandas/Matplotlib.

Is there a reason you didn't instead use more concise syntax like image.slice{_mut}!(*x, *y, ..) and image.slice{_mut}![*x, *y, ..] or image.s![0, .., ..]? (unsure which are valid syntax)

That syntax isn't supported right now. There's a proposal for that feature (postfix macros) in rust-lang/rfcs#2442.

@nyanpasu64
Copy link
Author

Julia

maybe i'll try that, and screw around with https://github.com/JuliaLang/PackageCompiler.jl to distribute binaries to users

powerful type system

honestly it's more like casting back and forth between i32, usize, and f32.

@jturner314
Copy link
Member

This issue seems to be resolved, and there haven't been any comments in a few months, so I'm closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants