Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slicedim and mapslices arguments inconsistent #18326

Closed
Mice7R opened this issue Sep 1, 2016 · 15 comments
Closed

slicedim and mapslices arguments inconsistent #18326

Mice7R opened this issue Sep 1, 2016 · 15 comments
Assignees
Labels
arrays [a, r, r, a, y, s] breaking This change will break code needs decision A decision on this change is needed
Milestone

Comments

@Mice7R
Copy link

Mice7R commented Sep 1, 2016

In slicedim() you tell which dimension equals a value:

a=rand(6,5)
slicedim(a, 1, n) # returns n row a[n, :]
slicedim(a, 2, n) # returns n column a[:, n]

On the other hand in mapslices you tell where the dots (:) are, so:
mapslice(f, a, 1) # a[:, n]
Would pass columns to function f.

@JeffBezanson JeffBezanson added the breaking This change will break code label May 2, 2017
@JeffBezanson JeffBezanson added this to the 1.0 milestone May 2, 2017
@JeffBezanson JeffBezanson added needs decision A decision on this change is needed arrays [a, r, r, a, y, s] labels May 2, 2017
@JeffBezanson
Copy link
Member

Yes, lately when I've looked at mapslices I've had the same thought, that the meaning of its dimension argument should perhaps be inverted. One way to think about it is that mapslices(f, a, 1) should mean "call f on parts of a where dimension 1 has a constant value", which is pretty intuitive to me.

I believe the function behaves the way it does currently to be more similar to functions like sum, i.e. sum(x, d) == mapslices(sum, x, d), so this is a bit of a tough call.

@BobPortmann
Copy link
Contributor

I think that it is slicedim which is backwards here. It should "slice the dim" instead of "slice the other dims" which I what it is doing now. In fact, I recent wrote a function to "slice the dim" and slicedim seemed like the obvious name but was taken. I ended up calling it arrayslice instead. I needed that in order to implement mapslices like behavior in cases where the f call is more complicated (as an aside, it would be awesome to have more sophisticated mapslices-like behavior built in).

In the mapslices case, I read this as "apply f to slices of dim". The beauty of mapslices is that the number of dims that one passes in is the same as the number of dimensions the function expects. It would become much more difficult to call if meaning of the dimension input was reversed. For example, compare the present behavior in the common case where f acts on a vector

a1 = rand(4,3,2)
a2 = rand(4,3,2,5,6)
f(x) = 2x
mapslices(f, a1, 2)
mapslices(f, a2, 2)

with the reverse

mapslices(f, a1, 1, 3)
mapslices(f, a2, 1, 3, 4, 5)

This change would make it hard to write generic code that works for different sized input arrays, which is the whole point of mapslices in my opinion.

@JeffBezanson
Copy link
Member

the number of dims that one passes in is the same as the number of dimensions the function expects

That's a good point.

it would be awesome to have more sophisticated mapslices-like behavior built in

Now I'm curious --- could you explain?

I think it would be useful to have standard terminology for both kinds of dimension arguments. I suspect both cases come up. For example you might want to "select" (in an sql-like sense) certain dimensions to keep, and have all the other dimensions reduced, or specify which dimensions to reduce (as in the current sum etc. functions).

@StefanKarpinski
Copy link
Member

Should we just call the correct version of slicedim just slice since that name is free?

@BobPortmann
Copy link
Contributor

Now I'm curious --- could you explain?

Sure, mapslices only works when the function only takes one argument, it would be great if it could take any number of arguments. This is easy to implement if all of the inputs have the same size but if they are all different then it gets much harder. Then it becomes a kind of broadcasting but for functions acting on slices instead of scalars. A nice syntax for dealing with all the possibilities as auto-magically as possible is what I was alluding to above. Using something like AxisArrays would probably be the easiest way to implement this (since you've told the system what each dimension represents) but is also more restricting because one must use AxisArrays. It would be nice to have a system that worked on combinations of regular arrays also.

@mbauman
Copy link
Member

mbauman commented Sep 15, 2017

Must these two agree? They do very different things. slicedim is basically a form of indexing that we can't really express with our current set of indices. That said, I'd be surprised if it's used all that often. Across the entire registered package ecosystem, I count 15 lines where Base's slicedim is used. Interestingly, one package (Match.jl) defines its own slicedim implementation that returns views — that's actually much more useful since it's most commonly found in loops. Another package effectively re-implements it in order to allow writing to the array instead of just extraction. This is all screaming for a way to express it as an index. Of those 15 lines, at least 6 could be expressed using an "ellipsis" notation like A[…, i] or A[i, …].

On the other hand, mapslices really needs to agree with sum and the other multidimensional reducers.

@mbauman
Copy link
Member

mbauman commented Sep 15, 2017

Perhaps the biggest problem here is that they both use the root slice. If we can figure out a way to spell slicedim as a index type then that'd be solved. It's actually really easy to construct an object that expands out to the indices (:,:,:,i,:,:) (edit: although it'd be unstable :-/), the hard part is what to name it (edit: and making it stable/fast). Perhaps something that refers to the dimension? alongdimension? ondimension?

For reference, the current slicedim help states:

Return all the data of A where the index for dimension d equals i. Equivalent to A[:,:,...,i,:,:,...] where i is in position d.

@timholy
Copy link
Member

timholy commented Sep 16, 2017

I think the unstable one is the right answer. I've argued elsewhere that we should support

mapslices(f, A, (:, *, :))

where here * means "fill this slot with an integer". The main reason for this is that this version of mapslices is inferrable, whereas mapslices(f, A, 2) is not. It's best to do all the non-inferrable stuff at the beginning and make the "main" algorithm inferrable, so that people who want the performance advantages can call the inferrable version directly. But some functions are sometimes more convenient in non-inferrable form, so we should support those too.

@JeffBezanson JeffBezanson added triage This should be discussed on a triage call and removed triage This should be discussed on a triage call labels Dec 31, 2017
@JeffBezanson
Copy link
Member

Should we just close this?

@mbauman
Copy link
Member

mbauman commented Jan 30, 2018

I'm still very tempted to deprecate slicedim without an immediate replacement.

Across the entire registered package ecosystem, I count 15 lines where Base's slicedim is used.

@StefanKarpinski
Copy link
Member

Do it! That sounds like a great deletion candidate.

@JeffBezanson
Copy link
Member

Then why not just leave it until we do have a replacement?

@mbauman
Copy link
Member

mbauman commented Jan 30, 2018

Because the name is confusing. If we can come up with a better name, I'd happily rename it.

@JeffBezanson
Copy link
Member

It's similar to a select --- basically "select from A where dim #i equals n".

@JeffBezanson
Copy link
Member

selectdim?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

7 participants