RFC: support iteration for `CartesianIndex` #48404

simeonschaub · 2023-01-25T18:28:47Z

I'm almost certain this came up before, but wasn't able to find the
discussion. I think now that the multidimensional indexing API has
matured a bit, the potential for confusion when allowing iteration is
excedingly low. With this change, it's possible to destructure
CartesianIndex objects without having to convert them to a tuple,
which I feel currently makes writing such code unnecessarily annoying.

This also makes CartesianIndex broadcast like a scalar which I felt
was consistent with the behavior for objects like Pair, but I am open
to other suggestions.

I'm almost certain this came up before, but wasn't able to find the discussion. I think now that the multidimensional indexing API has matured a bit, the potential for confusion when allowing iteration is excedingly low. With this change, it's possible to destructure `CartesianIndex` objects without having to convert them to a tuple, which I feel currently makes writing such code unnecessarily annoying. This also makes `CartesianIndex` broadcast like a scalar which I felt was consistent with the behavior for objects like `Pair`, but I am open to other suggestions.

jishnub · 2023-01-25T20:23:12Z

Related: #47044

mikmoore · 2023-01-25T23:49:44Z

I'm favorable toward broadcasting as a scalar.
I'm unfavorable toward iteration as a collection. Iteration as a scalar doesn't seem so unreasonable, however.

Are there any Base objects that currently have such radically different broadcast vs iteration behavior as is proposed here? That's my initial concern. I don't mind the current cast-to-tuple requirement, as it's clear and doesn't (shouldn't?) actually impose any runtime cost.

In my mind, a CartesianIndex is a single index. It's not a collection of coordinates, even if that happens to be its representation. To iterate on the coordinates seems like an extreme pun on its semantics.

simeonschaub · 2023-01-26T00:02:32Z

Are there any Base objects that currently have such radically different broadcast vs iteration behavior as is proposed here? That's my initial concern. I don't mind the current cast-to-tuple requirement, as it's clear and doesn't (shouldn't?) actually impose any runtime cost.

There's a bunch. I already mentioned Pair, but all AbstractStrings also behave this way.

uniment · 2023-01-26T05:42:48Z

Looks like a good idea. This discourse conversation discusses another potential benefit to making CartesianIndex iterable.

mkitti · 2023-01-26T09:04:07Z

I think we have been down this road before:

This is a duplicate of #23982 (RFC: make CartesianIndex iterable)

See conversation in #23719 . Specifically, can we now avoid the performance trap?

jishnub · 2023-01-26T09:13:40Z

If I understand correctly, the performance trap doesn't exist on v1.9

julia> @inline Base.iterate(index::CartesianIndex) = iterate(index.I)

julia> @inline Base.iterate(index::CartesianIndex, st...) = iterate(index.I, st...)

julia> function mysum1(A)
                 s = 0.0
                 @inbounds for i in CartesianIndices(A)
                     s += A[i]
                 end
                 s
             end
mysum1 (generic function with 1 method)

julia> function mysum2(A)
                  s = 0.0
                  @inbounds for i in CartesianIndices(A)
                      s += A[i...]
                  end
                  s
              end
mysum2 (generic function with 1 method)

julia> using BenchmarkTools

julia> A = rand(10,10,10,10,10,10);

julia> @btime mysum1($A);
  1.620 ms (0 allocations: 0 bytes)

julia> @btime mysum2($A);
  1.541 ms (0 allocations: 0 bytes)

julia> mysum1(A) == mysum2(A)
true

mkitti · 2023-01-26T09:53:49Z

Do we know what changed to close the performance gap? Is there some situation where we might not have closed it?

simeonschaub · 2023-01-26T11:03:46Z

Specifically, can we now avoid the performance trap?

I think calling this a performance trap might have been a bit overblown. By now, I think people know how to use CartesianIndex idiomatically, so sacrificing the convenience of being able to destructure indices just to prevent people from writing more complicated, unidiomatic code which is slower in a few cases seems a bit silly to me. Seems like this point is moot now though anyway.

mkitti · 2023-01-26T14:26:03Z

By now, I think people know how to use CartesianIndex idiomatically, so sacrificing the convenience of being able to destructure indices just to prevent people from writing more complicated, unidiomatic code which is slower in a few cases seems a bit silly to me.

The passing of time does not solve the issue for new users to Julia. However, given the current benchmarks of the original example, I concur that the original issues appears to be moot. Perhaps we could invite @timholy to present a new challenge?

timholy · 2023-01-26T15:37:37Z

I seem to recall that @mbauman had the clearest examples of the problems of making CartesianIndex iterable.

mbauman · 2023-01-26T17:01:49Z

This was back in #21148 — which I closed as we finalized the new broadcasting semantics that defaulted to iteration with very few exceptions. I think the only thing that was iterable but broadcasted like a scalar at that time was String... and I was very hesitant to add others like it (I was deep in the weeds of #18618.). Since then we've added Pair to that list. Are there any others?

So there are two questions here:

Do we make CartesianIndex broadcast like a scalar? This is a definitely yes in my book — we already do that in some cases. Cf. Make CartesianIndexs broadcast like scalars? #38432 and Broadcasted addition of one CarteseanIndex to an array of them emits cryptic error #41540. Oh! And this is already ready-to-go with news and tests at treat CartesianIndex as a broadcast scalar #47044. Since I removed the triage label there, I'm adding it here. :)
Do we also make CartesianIndex iterate like a container? This adds it to the list of things that broadcast like a scalar but iterate like a container. I'm obviously more ok with this than I was back then.

Now that my head is back in #18618-land, I'm also thinking that there's a third question here:

If we do make it iterable, should we add special handling or an explicit error for map? I had really wanted map(f, x) and f.(x) to have similar behaviors in the one-arg case, and how we treat strings is interesting¹ — should we similarly enforce that it reconstructs a CartesianIndex?
¹ map with strings ensures the output is a string and doesn't support the fallback
```
julia> map(uppercase, "αbγ")
"ΑBΓ"

julia> map(isascii, "αbγ")
ERROR: ArgumentError: map(f, s::AbstractString) requires f to return AbstractChar; try map(f, collect(s)) or a comprehension instead
Stacktrace:
 [1] map(f::typeof(isascii), s::String)
   @ Base ./strings/basic.jl:620
 [2] top-level scope
   @ REPL[29]:1
```

timholy · 2023-01-26T17:42:03Z

Do we also make CartesianIndex iterate like a container? This adds it to the list of things that broadcast like a scalar but iterate like a container.

This is my main concern too, and I confess a hint of uneasiness. String and Pair seem a little bit more special-purpose than an arbitrary-length numerical object like CartesianIndex. But I can be talked into it.

mkitti · 2023-01-26T18:13:28Z

If we do make it iterable, should we add special handling or an explicit error for map?

My default would be to throw an error for now and reconsider in another PR since there is uncertainty.

MasonProtter · 2023-01-26T18:36:36Z

IMO, being unable to destructure and splat a CartesianIndex is just so ugly and annoying that I'd be strongly in favour of having another weird object that is a broadcast scalar yet iterable.

timholy · 2023-01-26T19:44:40Z

@MasonProtter I'm sure you know about Tuple(I)---still too ugly?

mkitti · 2023-01-26T20:00:54Z

@MasonProtter I'm sure you know about Tuple(I)---still too ugly?

I go between Tuple(I) and I.I depending on how verbose I feel that day. I do remember finding this challenging as a beginner. I would appreciate CartesianIndex more if I did not have to explicitly convert it to a tuple so often.

simeonschaub · 2023-01-26T20:20:32Z

I'm sure you know about Tuple(I)---still too ugly?

The main reason I find this annoying is that I often find myself wanting to write code similar to the following:

for (i, j) in CartesianIndices(M)
    if abs(i) + abs(j) ≤ N
        # do something
    end
end

Currently, you would have to write this as:

for I in CartesianIndices(M)
    (i, j) = Tuple(I)
    if abs(i) + abs(j) ≤ N
        # do something
    end
end

It's not the end of the world of course, but I just found it quite unergonomical.

With the support for slurping added here, this is even more powerful, e.g. to contract an n-1-dimensional array with a vector you could simply write:

for (I..., j) in CartesianIndices(B)
    A[I] += B[I, j] * v[j]
end

With the current state of things, this would involve converting back and forth between CartesianIndex and Tuple, which results in less expressive code.

adkabo · 2023-01-26T20:40:39Z

I propose another solution: decouple unpacking from iteration. Base.unpack would work on CartesianIndex and iteration would not.

This solution doesn't proliferate the strange iterable-but-scalar objects but retains the syntactic convenience of a,b,c = CI.

MasonProtter · 2023-01-26T20:44:23Z

Great example @simeonschaub, that's exactly the sort of thing I've also encountered.

vtjnash · 2023-01-26T21:36:09Z

Base.unpack is called Base.indexed_iterate, though we do not necessarily guarantee that API, and generally do not expect it to be different than iterate

adkabo · 2023-01-26T22:28:33Z

though we do not necessarily guarantee that API, and generally do not expect it to be different than iterate

Julia can introduce unpack with a fallback to iterate, and document that it must agree with iterate if iterate exists but can be implemented on non-iterate types. That solves the syntax convenience without surprising semantic differences.

Introducing types where iterate disagrees with broadcast is a bigger decision: it requires downstream users to specifically handle them. For example, DataFrames.jl is providing a scalar= kwarg for end users to avoid accidentally unpacking String into a Vector{Char}. Users would now have to pass scalar=Union{AbstractString, CartesianIndex} instead, whenever they have CartesianIndex columns (an unusual situation, granted), if I'm not mistaken (cc @bkamins).

bkamins · 2023-01-27T08:15:03Z

Introducing types where iterate disagrees with broadcast is a bigger decision

There is a long list of such types, so I guess we have to live with this discrepancy.

@simeonschaub - my question would be in what cases do we need to broadcast CartesianIndex as a scalar? Maybe the "broadcasting" part of this PR can be left-out to avoid introducing a change if we do not have a clear use-case for?

It is easy enough to use Ref around CartesianIndex if needed I think.

The point is that It is not clear (at least for me) if it is better to make:

CartesianIndex(1,1) .+ 1 # currently you have to write: CartesianIndex(Tuple(CartesianIndex(1,1)) .+ 1)

or

getindex.([[1 2; 3 4]], CartesianIndex(1,1)) # currently you have to write getindex.([[1 2; 3 4]], Ref(CartesianIndex(1,1)))

work.

simeonschaub · 2023-01-27T08:44:14Z

my question would be in what cases do we need to broadcast CartesianIndex as a scalar? Maybe the "broadcasting" part of this PR can be left-out to avoid introducing a change if we do not have a clear use-case for?

Leaving out the broadcasting part means this will use the generic fallback and collect before broadcasting, so either way we have to make a decision now – we can't change this behavior later. If DataFrames wants to mimick broadcasting behavior, why doesn't it just call broadcastable under the hood?

bkamins · 2023-01-27T08:51:05Z

Leaving out the broadcasting part means this will use the generic fallback and collect before broadcasting, so either way we have to make a decision now – we can't change this behavior later.

I proposed to disallow broadcasting, just like broadcasting is disallowed for NamedTuple or Dict, although they are iterable.
In this way will be able to change this behavior later and not have to make the decision now (unless we are clear what we want).

If DataFrames wants to mimick broadcasting behavior, why doesn't it just call broadcastable under the hood?

DataFrames.jl does not want to mimick broadcasting. (my comment above is unrelated to DataFrames.jl; for those interested let me just comment that "scalar" is something different than size-1 broadcastable object in the context of DataFrames.jl; a similar thing is in linear algebra you allow multiplication by a scalar, but do not allow multiplication by 1-element vector)

timholy · 2023-01-27T09:41:02Z

CartesianIndex(1,1) .+ 1

This, at least, we have a reasonably nice idiom for: I + oneunit(I).

mikmoore · 2023-01-27T15:34:33Z

I don't think it can make sense to broadcast as anything other than a scalar. CartesianIndex's primary purpose should supersede any convenience punning as a collection.

a = [1; 2;; 3; 4]
b = [5; 6;; 7; 8]
c = getindex.((a,b), CartesianIndex(1,2))
# c == (1, 6) if CartesianIndex broadcasts as collection
# c == (3, 7) if CartesianIndex broadcasts as scalar
d = getindex.((a,b,b), CartesianIndex(1,2))
# error if CartesianIndex broadcasts as collection
# d == (3, 7, 7) if CartesianIndex broadcasts as scalar

mbauman · 2023-01-30T19:45:17Z

The ship has sailed on how we broadcast CartesianIndex — #47044 is now merged but prior to that we've long had #29890.

So the only question is iterability.

Introducing types where iterate disagrees with broadcast is a bigger decision

There is a long list of such types...

Folks keep saying this, but at least within the standard library there is only AbstractString and Pair (and in the PR for pair it was also partially justified by a "many types" bandwagon). Are there many packages that do this?

bkamins · 2023-01-30T19:52:40Z

there is only AbstractString and Pair

And these two are constantly problematic + add NamedTuple and AbstractDict to the list. Having said that, I am OK to make CartesianIndex iterable, but maybe with the PR we could introduce some sub-section in the Julia Manual, listing these five types as special?

BTW: in DataFrames.jl DataFrameRow is iterable but not broadcastable, but only because we want to keep consistency with Base.

vtjnash · 2023-01-30T20:02:58Z

Folks keep saying this, but at least within the standard library there is only AbstractString and Pair (and in the PR for pair it was also partially justified by a "many types" bandwagon)

To slightly digress, I think we could also partly justify it by noting that broadcast is primarily about indexing, if possible, and otherwise treating the argument as a scalar. It then also allow iteration, since it is often compatible with indexing (or at least unambiguous with a call to collect then indexing). But for cases where iteration and indexing disagree (esp. Strings, but also Dict), then we still opt not to treat it as iterable. This does not account for Pair (other than it being intended as something slightly different from a Tuple) nor NamedTuple (wasn't NT added because it is expected to represent kwargs, but iteration over it does not return the kwarg pairs, but rather only produces the values when indexed?)

oscardssmith · 2023-02-02T19:52:58Z

Triage agrees with @mbauman as to what the potential issues are, but thinks that this is still a relatively reasonable choice to make and leaves the final decision with him.

Seelengrab · 2023-02-02T19:53:21Z

The main reason I find this annoying is that I often find myself wanting to write code similar to the following:

The way I usually write code like that is either like

for I in CartesianIndices(M)
    if abs(I[1]) + abs(I[2]) ≤ N
        # do something
    end
end

or

for I in CartesianIndices(M)
    check(I) || continue
    # do something
end

check(I) = abs(I[1]) + abs(I[2]) > N

because it retains the ability to extend the code easily to higher dimensions. From my POV, CartesianIndex is a really nice analog to a vector in an N-dimensional integer vectorspace and indexing those by single dimension doesn't really make sense to me. I don't have particularly strong feeling about it though, just that I personally probably wouldn't use this.

uniment · 2023-02-08T09:13:12Z

This seems more readily extensible to higher dimensions:

for I in CartesianIndices(M)
    if sum(abs, I) ≤ N
        # do something
    end
end

but that would require CartesianIndex to be iterable.

Seelengrab · 2023-02-08T09:57:11Z

That's not the same function anymore; the original looked at a 2D cube (rectangle) and did some processing based on that, while yours looks at a N-D cube (which doesn't ignore singleton dimensions). May be the correct choice in some situations, but it doesn't generalize cleanly - the original version ignores the additional dimensions after all, selecting every index in a third extended dimension. That's why I prefer the explicitness.

uniment · 2023-02-08T21:34:44Z

The example wasn't clear in which manner extensibility would be desired, so I picked what I thought was meant.

In any case, @mkitti's idea I posted above was to allow functions to be array initializers with a CartesianIndex for an argument. With this PR, it could be written as:

Array(m, n, o) do (i, j, k)
    i + j + k
end

If CartesianIndex isn't iterable, then I'd prefer not to use it.

simeonschaub added iteration Involves iteration or the iteration protocol needs compat annotation Add !!! compat "Julia x.y" to the docstring needs news A NEWS entry is required for this change labels Jan 25, 2023

fix tests

7b51502

MasonProtter mentioned this pull request Jan 26, 2023

Broadcasted addition of one CarteseanIndex to an array of them emits cryptic error #41540

Closed

mbauman added the triage This should be discussed on a triage call label Jan 26, 2023

uniment mentioned this pull request Feb 10, 2023

Functions as "Array Initializers" #48624

Open

JeffBezanson removed the triage This should be discussed on a triage call label Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: support iteration for `CartesianIndex` #48404

RFC: support iteration for `CartesianIndex` #48404

simeonschaub commented Jan 25, 2023

jishnub commented Jan 25, 2023

mikmoore commented Jan 25, 2023 •

edited

Loading

simeonschaub commented Jan 26, 2023

uniment commented Jan 26, 2023

mkitti commented Jan 26, 2023

jishnub commented Jan 26, 2023

mkitti commented Jan 26, 2023

simeonschaub commented Jan 26, 2023

mkitti commented Jan 26, 2023 •

edited

Loading

timholy commented Jan 26, 2023

mbauman commented Jan 26, 2023 •

edited

Loading

timholy commented Jan 26, 2023

mkitti commented Jan 26, 2023

MasonProtter commented Jan 26, 2023

timholy commented Jan 26, 2023

mkitti commented Jan 26, 2023

simeonschaub commented Jan 26, 2023 •

edited

Loading

adkabo commented Jan 26, 2023 •

edited

Loading

MasonProtter commented Jan 26, 2023

vtjnash commented Jan 26, 2023

adkabo commented Jan 26, 2023

bkamins commented Jan 27, 2023

simeonschaub commented Jan 27, 2023

bkamins commented Jan 27, 2023 •

edited

Loading

timholy commented Jan 27, 2023

mikmoore commented Jan 27, 2023

mbauman commented Jan 30, 2023 •

edited

Loading

bkamins commented Jan 30, 2023 •

edited

Loading

vtjnash commented Jan 30, 2023

oscardssmith commented Feb 2, 2023

Seelengrab commented Feb 2, 2023

uniment commented Feb 8, 2023

Seelengrab commented Feb 8, 2023

uniment commented Feb 8, 2023

RFC: support iteration for CartesianIndex #48404

Are you sure you want to change the base?

RFC: support iteration for CartesianIndex #48404

Conversation

simeonschaub commented Jan 25, 2023

jishnub commented Jan 25, 2023

mikmoore commented Jan 25, 2023 • edited Loading

simeonschaub commented Jan 26, 2023

uniment commented Jan 26, 2023

mkitti commented Jan 26, 2023

jishnub commented Jan 26, 2023

mkitti commented Jan 26, 2023

simeonschaub commented Jan 26, 2023

mkitti commented Jan 26, 2023 • edited Loading

timholy commented Jan 26, 2023

mbauman commented Jan 26, 2023 • edited Loading

timholy commented Jan 26, 2023

mkitti commented Jan 26, 2023

MasonProtter commented Jan 26, 2023

timholy commented Jan 26, 2023

mkitti commented Jan 26, 2023

simeonschaub commented Jan 26, 2023 • edited Loading

adkabo commented Jan 26, 2023 • edited Loading

MasonProtter commented Jan 26, 2023

vtjnash commented Jan 26, 2023

adkabo commented Jan 26, 2023

bkamins commented Jan 27, 2023

simeonschaub commented Jan 27, 2023

bkamins commented Jan 27, 2023 • edited Loading

timholy commented Jan 27, 2023

mikmoore commented Jan 27, 2023

mbauman commented Jan 30, 2023 • edited Loading

bkamins commented Jan 30, 2023 • edited Loading

vtjnash commented Jan 30, 2023

oscardssmith commented Feb 2, 2023

Seelengrab commented Feb 2, 2023

uniment commented Feb 8, 2023

Seelengrab commented Feb 8, 2023

uniment commented Feb 8, 2023

RFC: support iteration for `CartesianIndex` #48404

RFC: support iteration for `CartesianIndex` #48404

mikmoore commented Jan 25, 2023 •

edited

Loading

mkitti commented Jan 26, 2023 •

edited

Loading

mbauman commented Jan 26, 2023 •

edited

Loading

simeonschaub commented Jan 26, 2023 •

edited

Loading

adkabo commented Jan 26, 2023 •

edited

Loading

bkamins commented Jan 27, 2023 •

edited

Loading

mbauman commented Jan 30, 2023 •

edited

Loading

bkamins commented Jan 30, 2023 •

edited

Loading