Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape<D> and StrideShape<D> should be merged #367

Open
termoshtt opened this issue Oct 10, 2017 · 19 comments
Open

Shape<D> and StrideShape<D> should be merged #367

termoshtt opened this issue Oct 10, 2017 · 19 comments

Comments

@termoshtt
Copy link
Member

I am confusing around Shape<D>, StrideShape<D>, Dim, and &[Ix]. In my understanding, these can be grouped as follow:

  • Has stride information, i.e. these know how the values are aligned in the memory
    • Shape<D>
    • StrideShape<D>
  • Only the size of each dimensions
    • Dim
    • &[Ix]

And each of types in a group should be merged. I think StrideShape<D> and Dim are enough.
Since ArrayBase<S, D> already has stride: D, StrideShape<D> is appropriate, and it should be re-named Shape<D> instead. Associated to this, ShapeBuilder should be also simplified to generate StrideShape<D> (and renamed to IntoShape similar to IntoDimension).

What do you think?
If you agree, I will create PR :)

Thanks!

@bluss
Copy link
Member

bluss commented Oct 10, 2017

Note that Shape and StrideShape are different in this way:

  • Shape: either c- or f-contiguous
  • StrideShape: custom strides

I will look closer at this later -- but merging them means either adding or removing features.

@bluss
Copy link
Member

bluss commented Oct 10, 2017

Refreshing my memory: We allow custom strides (StrideShape) when the user provides the elements for us (Array::from_shape_vec).

Consider the question: How does Array::from_elem work with a Stride Shape. How do we handle "holes" in the storage, assuming we have a non-Copy element type (that must have its destructor run correctly for each element when the array drops)?

@termoshtt
Copy link
Member Author

Note that Shape and StrideShape are different in this way:

I think the what Shape can represent is also able to be represented using StrideShape.
For example, Shape { dim: [3, 4], is_c = True} corresponds to StrideShape { dim: [3, 4], stride: [1, 3], custom = False }. All shape: Shape<D> can be converted to StrideShape<D> like this, and a check of is_c is enough lightweight, stride[0] == 1 && stride[i] == dim[i-1] * ... * dim[0]. Hence we can replace Shape<D> by StrideShape<D> by introducing StrideShape::is_c(&self).

While Shape<D> has smaller memory than StrideShape<D>, but it cannot be a reason for splitting type for the shape of array since StrideShape<D> is also enough small. Single shape type makes the API design much simpler.

@bluss
Copy link
Member

bluss commented Oct 11, 2017

That's the simple part. See the question about custom strides in particular arrays that don't cover all elements in the base storage.

@bluss
Copy link
Member

bluss commented Oct 11, 2017

I think it seems good to force from_elem and zeros constructors to always make contiguous arrays. We can still improve the interface.

@termoshtt
Copy link
Member Author

How does Array::from_elem work with a Stride Shape.

We can simply raise error that the given StrideShape is not compatible like following:

fn from_elem<Sh: IntoShape<D>>(shape: Sh, elem: A) -> Result<Self, NonContiguousStrideError> {
  if shape.is_custom() {
    Err(NonContiguousStrideError::new())
  }
  ... (current implementation)
}

My concern is that the interface about shape is very complicated, e.g. there are Sh: ShapeBuilder<D> and Sh: Into<StrideShape<D>> or ArrayBase::shape() returns &[Ix] instead of Shape<D>, and so on. I understand there are some algorithms which assumes contiguous memory alignment, but these should be managed by an error handling like above.

@bluss
Copy link
Member

bluss commented Oct 12, 2017

Sorry that I don't have so much time for this question during the week.

I share your concern for the complexity of the API here, and it's a concern I keep in mind with ndarray all the time. We can do so much with the type system, but we need find solutions that don't make the library too hard to understand or use. We have to find a balance, and you can find traces of this balance conversation all over the issues in this repo!

I will not lose this issue and I would want to discuss many other solutions than the proposed one, for example other ways of unifying the shape API or better documentation. But first let's try to finish the from_elem or equivalently zeros discussion (they use the shape the same way).

Array::zeros

(Equivalently from_elem, default etc use the shape in the same way)

The current shape API for Array::zeros is very nice. Let's look at its positive sides:

  • It only accepts contiguous shapes
  • No shape it accepts is an error
    • Except for shapes whose element count wraps around type limits. That's a panic.

Ok, so far so good. A bit of infidelity there, a panic case, but it's supposedly in an uncommon case. It's pretty hard to run into.

Positive points:

  • The API guides the user. We can only enter c- and f-contiguous shapes to Array::zeros. Alternatives for the user are clear.
  • Ruling out error conditions at compile time. Can't use a custom stride shape with zeros, and it is enforced by the type system.

I think this question answers itself:

  • Should we change Array::zeros so that it can return an error? We know that the error it can return is one that the construction of the API rules out statically today.

@termoshtt
Copy link
Member Author

termoshtt commented Oct 13, 2017

The current shape API for Array::zeros is very nice.

I understand that you want to assure the contiguous memory layout by a type system using Shape<D>, and I agree that it is better.

Then, my concern changes: Does StrideShape<D> include Shape<D>?
As I said above, shape: Shape<D> can be always convertible to StrideShape<D>, and we can say that the StrideShape<D> is general shape type, and Shape<D> is special, contiguous shape type.
But your above comment "StrideShape: custom strides" means these are distinct types, i.e. a shape is Shape<D> or StrideShape<D>. Their naming also supports this view, but the implementation does not support it.

Sorry that I don't have so much time for this question during the week.

No problem, it is not a hurried issue. I thank your conscientious comments a lot.

@bluss
Copy link
Member

bluss commented Oct 22, 2017

Sure, StrideShape includes the shapes + strides that Shape can describe.

I've been wondering if we can make the API more clear without changing how this actually works. Maybe by for example using a more specific trait than something as indirect as the current Into<StrideShape<D>> as we use for Array::from_shape_vec.

@termoshtt
Copy link
Member Author

termoshtt commented Oct 25, 2017

My original concern is ArrayBase::shape, which returns &[Ix] instead of Shape<D>. This is due to the Shape<D> is not a static property of ArrayBase, i.e. we cannot obtain Shape<D> without checking its strides. In this sense the StrideShape<D> is a property of ArrayBase and we can get it without error check.

After discussion in this thread, I understood the roles of Shape<D> and StrideShape<D> are appropriate, but their names are misleading. Shape<D> is only used for generating a continuous array, and it describes the shape of ArrayBase only in a limited case. This is a special structure than StrideShape<D>, but its name is rather general.

My plan is here:

  • Rename structures
    • Shape<D> -> ContShape<D>
    • StrideShape<D> -> Shape<D>
    • impl Into<Shape<D>> for ContShape<D> { /* use C-oder */ } (fixed)
  • impl PartialEq<Shape<D>> for ContShape<D>
  • Change APIs of ArrayBase
    • ArrayBase::cont_shape(&self) -> Option<ContShape<D>>
    • ArrayBase::shape_as_cont(&self) -> ContShape<D> // use C-order if stride of array is custom
    • ArrayBase::shape(&self) -> Shape<D>
      • then a.shape() == a.shape_as_cont() is false if a has custom stride

For example, these three shape API is better in a following case:

Ruling out error conditions at compile time. Can't use a custom stride shape with zeros, and it is enforced by the type system.

In current API, ArrayBase::shape returns &[Ix], which drops the stride of the array.

fn some_func<A, S: Data<Elem=A>, D: Dimension>(x: &ArrayBase<S, D>) -> Array<A, D> {
  let mut a = Array::zeros(x.shape());
  // modify a ...
  a
}

In this case x and a may have different strides, but we cannot see it from this source.
Using my shape API:

fn some_func<A, S: Data<Elem=A>, D: Dimension>(x: &ArrayBase<S, D>) -> Array<A, D> {
  let mut a = Array::zeros(x.shape_as_cont());
  // modify a ...
  a
}

the stride is explicitly dropped with shape_as_cont.

@bluss
Copy link
Member

bluss commented Oct 25, 2017

OK, so the naming of methods dim and shape seem to be part of the problem.

@bluss
Copy link
Member

bluss commented Oct 25, 2017

Add raw_dim to the mix, if you also take these into account, what's the best solution?

@termoshtt
Copy link
Member Author

Add raw_dim to the mix

Can we remove dim()? At a glance, the Dim::Pattern can be replaced with impl Deref<(Ix, Ix)> for Ix2 { ... } and so on. If it is true, we should use ArrayBase::dim(&self) -> &D, and remove raw_dim

@bluss
Copy link
Member

bluss commented Oct 25, 2017

Deref can not do custom conversions like that. In general it can only give references to interior fields, there is no tuple inside an ix2.

@bluss
Copy link
Member

bluss commented Oct 25, 2017

I'll try to propose something later today.

@termoshtt
Copy link
Member Author

termoshtt commented Oct 25, 2017

Deref can not do custom conversions like that

Sure, you are correct.

A remaining issue is the type of stride as discussed in #283
ArrayBase::stride returns &[Ixs], but which elements are dynamically casted from Ix, and stride in ArrayBase has type D which is usually Dim<[Ix; n]>.
Though there is ShapeBuilder::Strides associated type, to couple Dim and Stride will be better using Stride trait corresponding Dimension trait:

trait Dimension {
   type Strd: Stride<Dim = Self>;
   // current impl
}

trait Stride {
  type Dim: Dimension<Strd = Self>;
  // and other common methods
}

type Ixs2 = Dim<[Ixs; 2]>; // or define Strd<I> instead of Dim

impl Stride for Ixs2 {
  type Dim = Ix2;
  ...
}

Then we can implement ArrayBase::stride(&self) -> D::Strd, and make the stride in ArrayBase signed like: ArrayBase { ..., stride: D::Strd }.

@bluss
Copy link
Member

bluss commented Oct 26, 2017

Let's leave the strongly typed strides out of this if we can, so that we have a problem small enough to solve.

@bluss
Copy link
Member

bluss commented Nov 1, 2017

I simply haven't had time to come back to this yet.

@zroug
Copy link

zroug commented Sep 1, 2018

I got confused by Dimension, Shape and StrideShape too and I also think that part of the problem is that the naming is not ideal. Especially I think that Shape sounds like it does mean what Dimension does. Thats because other popular libraries that I use, like Numpy or TensorFlow, use the name shape for what ndarray calls dimension.

My naming suggestions for StrideShape are either Layout as in memory layout or simply Strides. Shape could then be renamed to for example SimpleLayout/Strides or ContinuousLayout/Strides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants