Skip to content

A few extensions I've needed of ndarray #178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
daniel-vainsencher opened this issue Apr 6, 2016 · 22 comments
Open

A few extensions I've needed of ndarray #178

daniel-vainsencher opened this issue Apr 6, 2016 · 22 comments

Comments

@daniel-vainsencher
Copy link
Contributor

Relevant to any dimension:

trait MyArr<S: Data<Elem=A>, A: NdFloat, D: Dimension>
{
    // Yes, trivial with from_elem, but useful enough.
    fn ones(dim: D) -> ArrayBase<S,D>;
    // Also very common stuff
    fn powf(&self, exp: A) -> ArrayBase<Vec<A>,D>;
    fn sqr(&self) -> ArrayBase<Vec<A>,D>;
    fn norm(&self) -> A;
    fn max(&self) -> A;
}

Relevant to matrices only:

trait Mat<D: NdFloat> {
    fn sum_sqr_cw(&self) -> VE; // column wise sum of squares

    // Copy out a submatrix selecting arbitrary rows and columns from self.
    fn submatrix(&self, rows: &Vec<Ix>, columns: &Vec<Ix>) -> MA;
}

BTW, I receive a warning that fold is deprecated because it forces a particular order. I am not sure what you mean by that, and what is a good replacement for it.

I can submit a PR for those if welcome.

@bluss
Copy link
Member

bluss commented Apr 6, 2016

Thank you. powf and sqr should be served relatively well by mapv and mapv_inplace.

sqrt is simple: a.mapv(f32::sqrt) (returns a new OwnedArray)., powf would be maybe a.mapv(|x| x.powf(exp)).

norm -- good idea. max -- I already use this, so I agree..

Please use the type alias OwnedArray<A, D> which is the same thing as ArrrayBase<Vec<A>, D>. I'm thinking we may need to change the definition of OwnedArray in the future (to maybe have some flexibility for the allocator, to not use Vec), and it's then smoother for you (and simpler) to just type OwnedArray.

.fold() has the warning because it is documented to say the elements will be visited in a particular order. This is a mistake.. prescribing a particular order is bad for performance, ndarray wants the freedom to pick the best traversal order for the array. This is the reason it needs to be deprecated. We should add the replacement that doesn't have the order guarantee ASAP, I guess.

sum_sqr_cw, I wonder if there is a more general way to do this. Same with submatrix. Does submatrix select the intersection of those rows and columns?

@daniel-vainsencher
Copy link
Contributor Author

I wasn't clear that I've implemented all of these for myself, implementations are obvious. I think all of these are common enough operations that there should be trivial patterns for them in ndarray (or an in-repo extension crate), the questions are about designing a nice language.

I think all of these should create OwnedArrays; in-place variants are a low priority IMO.

OwnedArray: sure.

fold: I would just change the documentation to clearly not guarantee an ordering, and SHOUT it in the changelog. I do not think you have to worry about that level of compatibility and deprecation at this early stage, using the nicest names for important operations is much more important.

I am not sure what you mean by intersection. A submatrix, just like a slice, receives a spec of the rows to keep and a spec of the columns to keep. These just happen to be arbitrary, hence cannot be fulfilled by striding the existing implementation, therefore I propose to create a copy. A view version could be possible if we generalize views to allow for sufficiently general coordinate mappings, but these cannot be expected to be fast. When space matters more than speed such view might be a good idea (and I've used BTreeMaps to do something similar at least once), but I would not advocate them now. Anyway, this obviously generalizes to subarray.

sum_sqr_cw: yes, this begs for finding a nice way to implement as chained expressions, but the obivous a.sqr().sum(Axis(3)) allocates for the squared array then immediately consumes it to sum over a dimension which is wasteful. This is a specialized map-reduce, might be a convenient API for it.

BTW, in addition to numpy idioms for arrays, another rich source worthy of mining is R.

@bluss
Copy link
Member

bluss commented Apr 7, 2016

Here's another one, a bit more of the general combinator flavour https://github.com/bluss/rust-ndarray/blob/master/ndarray-tests/tests/accuracy.rs#L254-L268 (fold axis can help computing min/max, ptp along an axis).

@daniel-vainsencher
Copy link
Contributor Author

I think we need a general-specific spectrum of languages around ndarray, so everything can be expressed, but common concepts are both readable and concise.

map_reduce(mapper, reducer) is very general and usable for many one-off calculations, fold_axis and fold are more specific and efficient, but (for example) by far the most common folds are going to be sums and products over rows and columns (or other axes in higher dimensions), so how about stuff like: a.map_sum_columns(|v| v.sqr())?

This starts to be almost as readable as a.square_sum_columns() and is much more general.

So, shall we move Utils out of tests and start building it out? is there already a place/design for the matrix and vector specific traits?

@bluss
Copy link
Member

bluss commented Apr 7, 2016

Good suggestion about .map_sum_columns. Or .map_sum(Axis, closure) if I'm allowed to stretch the generality of it.

So, Rust 1.8 is nearing. Great news -- all the iadd methods will be gone and += and so on will be enabled.

My idea was to split the crate into:

  • ndarray-core
    • This is most of what ndarray has, except the items below
  • ndarray-numeric
    • Numerical things are sum, mean, .square, .sqrt, norm, etc
  • ndarray-linalg
    • Dot, matrix multiplication, and everything linear algebra
  • ndarray-rand (already exists)
  • ndarray-rblas (already exists, specific crate integration, we will probably leave this path)
  • ndarray
    • Umbrella crate that reexports possibly all of core, numeric, linalg, rand.

@daniel-vainsencher
Copy link
Contributor Author

That looks like a good architecture to me. Should I wait for that change
before starting to upstream-PR utils?

On Thu, Apr 7, 2016 at 12:12 PM, bluss notifications@github.com wrote:

Good suggestion about .map_sum_columns. Or .map_sum(Axis, closure) if I'm
allowed to stretch the generality of it.

So, Rust 1.8 is nearing. Great news -- all the iadd methods will be gone
and += and so on will be enabled.

My idea was to split the crate into:

  • ndarray-core
    • This is most of what ndarray has, except the items below
  • ndarray-numeric
    • Numerical things are sum, mean, .square, .sqrt, norm, etc
  • ndarray-linalg
    • Dot, matrix multiplication, and everything linear algebra
  • ndarray-rand (already exists)
  • ndarray-rblas (already exists, specific crate integration, we will
    probably leave this path)
  • ndarray
    • Umbrella crate that reexports possibly all of core, numeric,
      linalg, rand.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#178 (comment)

Daniel Vainsencher

@bluss
Copy link
Member

bluss commented Apr 7, 2016

The idea is the wait need not be that long. So, I'll get the split started today or tomorrow. If not.. PR away.

@daniel-vainsencher
Copy link
Contributor Author

If its a plan for the coming week, I'll definitely wait and aim at the
right place from the start.

On Thu, Apr 7, 2016 at 2:24 PM, bluss notifications@github.com wrote:

The idea is the wait need not be that long. So, I'll get the split started
today or tomorrow. If not.. PR away.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#178 (comment)

@bluss
Copy link
Member

bluss commented Apr 8, 2016

So I have experimented with the split plan. I've finally found the pain that was foretold.

Reexporting ndarray-core into ndarray does a number on rustdoc and some things come out subtly worse. It uses ArrayBase<Vec<A>, _> instead of OwnedArray and other blemishes, and doc examples don't work out.

The core split plan is cancelled. I will have a second go at a super crate, without a core split now.

@daniel-vainsencher
Copy link
Contributor Author

Hmm, I didn't suspect rustdoc as a source of pain, though it makes sense.
Big crates are probably a problem for other as well, maybe worth filing
some bugs on rustdoc etc?

On the other hand, incremental compilation may alleviate some of the pain
of a super crate.

Daniel

On Fri, Apr 8, 2016 at 11:39 AM, bluss notifications@github.com wrote:

So I have experimented with the split plan. I've finally found the pain
that was foretold.

Reexporting ndarray-core into ndarray does a number on rustdoc and some
things come out subtly worse. It uses ArrayBase<Vec, _> instead of
OwnedArray and other blemishes, and doc examples don't work out.

The core split plan is cancelled. I will have a second go at a super
crate, without a core split now.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#178 (comment)

@bluss
Copy link
Member

bluss commented Apr 8, 2016

Maybe it works out if we use ndarray as a strong idependent core, and a new crate that includes it and extends it (without pretending to be a facade to it).

  • ndarray-numeric
    • depends on ndarray
    • optional ndarray-rand
    • optional ndarray-linalg

just needs a nicer name than ndarray-numeric. It might require dropping the ndarray "brand" from the name.. I have an idea.

@bluss
Copy link
Member

bluss commented Apr 9, 2016

I've found a split that works, but I wonder what you think.

  • A new crate for ndarray+numerics
    • depends on ndarray (and reexports it)
    • ndnum does numerics and linear algebra
    • ndarray defines arithmetic (it must define all basic traits for ndarray)

Rustdoc for the draft

How I ported my projects to use it:

  1. Replace ndarray dep with ndnum.
  2. pub use ndnum::ndarray as ndarray; this at the top of the crate means the old use statements continue to work
  3. But you need pub use ndnum::prelude::*; or specifically importing the extension traits for the ndnum methods to work

@daniel-vainsencher
Copy link
Contributor Author

Looks good to me. As usual, I think you should see design for growth first,
new users second and porting ease last.

Change that only around a 1.0 release, and not hurry to do that one.
On Apr 9, 2016 6:47 AM, "bluss" notifications@github.com wrote:

I've found a split that works, but I wonder what you think.

  • A new crate for ndarray+numerics
    • depends on ndarray (and reexports it)
    • ndnum does numerics and linear algebra
    • ndarray defines arithmetic (it must define all basic traits for
      ndarray)

Rustdoc for the draft
http://bluss.github.io/ndnum/master/ndnum/index.html

How I ported my projects to use it:

  1. Replace ndarray dep with ndnum.
  2. pub use ndnum::ndarray as ndarray; this at the top of the crate
    means the old use statements continue to work
  3. But you need pub use ndnum::prelude::*; or specifically importing
    the extension traits for the ndnum methods to work


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#178 (comment)

@bluss
Copy link
Member

bluss commented Apr 9, 2016

Not sure how to design for growth first.

I think ndarray has accomplished its original goals (an efficient nd array).

Now we want to add more features, two categories, numerics and linalg.

@bluss
Copy link
Member

bluss commented Apr 9, 2016

Ok after exploration of ndarray-core split, ndarray/ndnum split etc I don't want to split anymore. 😄 Exploration is good anyway.

It's hard to divorce ndarray from numerics. The trait impls (Add and everything else) needs to be in the main crate. There's been some fruitful splitting of code anyway, for example matrixmultiply is an external crate. We can maybe continue like that.

So. src/numeric/ will be a new directory. If we need any super specific types or functions, the module can even be public. Either way, new functionality is welcome there.

@daniel-vainsencher
Copy link
Contributor Author

For growth means convenient for you first and for other contributors.

One big crate is good for that as long as compilation time stays reasonable.

Not sure how to design for growth first.

I think ndarray has accomplished its original goals (an efficient nd array).

Now we want to add more features, two categories, numerics and linalg.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#178 (comment)

@bluss
Copy link
Member

bluss commented Apr 9, 2016

Thanks for expanding on that. I appreciate the feedback!

@bluss
Copy link
Member

bluss commented Apr 10, 2016

When I write a map + fold (along an axis) combination, it seems to me it is equivalent to just .fold-axis()

    /// Combine an elementwise mapping with a fold along an axis
    pub fn map_fold<B, C, F, G>(&self, axis: Axis, mut map: F, init: C, mut fold: G)
        -> OwnedArray<C, D::Smaller>
        where D: RemoveAxis,
              F: FnMut(&A) -> B,
              G: FnMut(&C, B) -> C,
              B: Clone,
              C: Clone,
    {
        let n = self.shape().axis(axis);
        let mut res = OwnedArray::from_elem(self.dim().remove_axis(axis), init);
        for subview in self.axis_iter(axis) {
            res.zip_mut_with(&subview, |x, y| *x = fold(x, map(y)));
        }
        res
    }

is this the kind of map reduce you were thinking of?

@daniel-vainsencher
Copy link
Contributor Author

By map reduce I meant the general operation, where the aggregation can be
according to any combination of the location and value. For example, group
elements by row+sum % 2 (checkerboard blacks and whites) and sum, but also
allow the value to be used in grouping.

The function you proposed above is included in fold_axis no?
When the fold is defined by the function (say, sums and products), the map
function is needed.

Daniel

On Sun, Apr 10, 2016 at 12:35 PM, bluss notifications@github.com wrote:

When I write a map + fold (along an axis) combination, it seems to me it
is equivalent to just .fold-axis()

/// Combine an elementwise mapping with a fold along an axis
pub fn map_fold<B, C, F, G>(&self, axis: Axis, mut map: F, init: C, mut fold: G)
    -> OwnedArray<C, D::Smaller>
    where D: RemoveAxis,
          F: FnMut(&A) -> B,
          G: FnMut(&C, B) -> C,
          B: Clone,
          C: Clone,
{
    let n = self.shape().axis(axis);
    let mut res = OwnedArray::from_elem(self.dim().remove_axis(axis), init);
    for subview in self.axis_iter(axis) {
        res.zip_mut_with(&subview, |x, y| *x = fold(x, map(y)));
    }
    res
}

is this the kind of map reduce you were thinking of?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#178 (comment)

@termoshtt
Copy link
Member

I'm looking for max and other basic operations described on the top of this issue.
Where can I find them, or should I implement them by fold myself ?

@bluss
Copy link
Member

bluss commented Apr 2, 2017

They still need to be implemented. We've prioritized general operations first before adding more specific ones.

@Jessime
Copy link

Jessime commented May 25, 2017

I'd like to suggest adding std (for the standard deviation) as an additional operation to consider. It's obviously more specific, but figured I could put it on your radar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants