RFC: Deprecate first() and last() on empty ranges #25385

nalimilan · 2018-01-04T11:21:21Z

This will allow throwing an error, for consistency with other AbstractArrays.
Introduce the rangestart and rangestop functions instead.

Fixes #22354.

This is RFC because two issues need to be discussed (apart from missing docs and FIXME):

It would be nice to use the new field overloading feature to write r.start and r.stop instead of adding rangestart and rangestop functions. I think this use for ranges has been considered when discussing the feature, but since that would be one of the first times it would be used in Base, I'd rather check that people support it before adopting that approach.
It turns out that in a few cases it would be useful to have rangestart and rangestop accept integers and return them, just like first and last do. See the second commit to have a list of the few places where it's needed. Unfortunately, that doesn't play well with using field overloading, except if we are OK with things like 1.start returning 1.

This will allow throwing an error, for consistency with other AbstractArrays. Introduce the rangestart() and rangestop() functions instead.

JeffBezanson · 2018-01-04T16:52:57Z

A definition like first(a::AbstractArray) = a[rangestart(eachindex(a))] worries me. Here we know that eachindex(a) is iterable, but we don't know whether it's a range, so there's no way to know you need to call rangestart.

JeffBezanson · 2018-01-04T17:03:33Z

Addendum: in that particular case it should probably still call first, since there will be a BoundsError either way.

I still don't love this change though. It seems like we're saying, "if you call first on a range and get errors, call rangestart instead". Seems to me we should just do that for you.

StefanKarpinski · 2018-01-04T17:19:25Z

This might be a good place for properties: make r.start, r.stop and r.step work as expected for all range types and have r.start and r.stop be the way to get what used to be first(r) and last(r). After all, if you know you're working with a collection that can be empty but still have a notion of a starting and stopping point, that basically implies that you're working with a range.

In other words my proposal is very similar to this change but you write r.start instead of rangestart(r) and r.stop instead of rangestop(r); r.step doesn't have an analogue here, but it would be a more consistent way to spell step(r) after such a change.

JeffBezanson · 2018-01-04T17:39:28Z

That was addressed in the OP.

StefanKarpinski · 2018-01-04T23:01:32Z

Question from triage: were any bugs revealed by this PR?

nalimilan · 2018-01-05T13:59:10Z

Question from triage: were any bugs revealed by this PR?

@StefanKarpinski I haven't found bugs. Though the kind of bugs this change would uncover are cases where first/last is called on an empty range, resulting in treating it as non-empty. This would only be caught by the PR if tests do pass empty ranges to functions which were not designed with ranges in mind.

However, the PR highlights that the special behavior of first/last on ranges extends to other similar types:

julia> first(CartesianIndices(1:0, 3:0))
CartesianIndex(1, 3)

julia> collect(CartesianIndices(1:0, 3:0))
0×0 Array{CartesianIndex{2},2}

See also below for a possible issue related to eachindex.

A definition like first(a::AbstractArray) = a[rangestart(eachindex(a))] worries me. Here we know that eachindex(a) is iterable, but we don't know whether it's a range, so there's no way to know you need to call rangestart.
Addendum: in that particular case it should probably still call first, since there will be a BoundsError either way.

@JeffBezanson Using first would throw an error about accessing the first index of OneTo(0) (for Array), which is much less user-friendly than printing an error about trying to access the array at index 1.

You're right that this pattern wouldn't work if eachindex returns something which doesn't implement rangestart. But currently it's not great either as a lot of code calls first/last on indices, and if these are not ranges then an error could be thrown when they are empty. The PR just makes this more visible.

I still don't love this change though. It seems like we're saying, "if you call first on a range and get errors, call rangestart instead". Seems to me we should just do that for you.

I have to admit it's not the sexiest PR ever. OTOH it's really weird to break the AbstractArray interface for fundamental types like ranges. What the PR shows is that when you're interested in the starting and ending values of a range even if it's empty, you don't actually want the first and last elements of an array: you want either these elements if they exist, or a pair in which the start is greater than the end, so that comparison checks automatically skip all operations. So maybe we could define rangestart/rangestop on empty arrays to return respectively 1 and 0: that would make code more generic than currently.

JeffBezanson · 2018-01-05T15:50:43Z

You're right that this pattern wouldn't work if eachindex returns something which doesn't implement rangestart. But currently it's not great either as a lot of code calls first/last on indices, and if these are not ranges then an error could be thrown when they are empty. The PR just makes this more visible.

These aren't equivalent situations. Currently, the issue is that first on an empty array might throw an error from two different places (either first(eachindex(a)) or getindex). That is an extremely minor issue. But by switching to rangestart, we are asserting that array index objects must be ranges, and/or spreading confusion about when to call rangestart instead of first. If rangestart had fallback definitions to make it work with more types, it would be even less clear.

JeffBezanson · 2018-01-05T15:53:00Z

base/abstractarray.jl

@@ -682,7 +682,7 @@ copyto!(dest::AbstractArray, src::AbstractArray) =

 function copyto!(::IndexStyle, dest::AbstractArray, ::IndexStyle, src::AbstractArray)
    destinds, srcinds = linearindices(dest), linearindices(src)
-    isempty(srcinds) || (first(srcinds) ∈ destinds && last(srcinds) ∈ destinds) ||
+    isempty(srcinds) || (rangestart(srcinds) ∈ destinds && rangestop(srcinds) ∈ destinds) ||


Since there is an isempty check, I think this case can be left alone.

JeffBezanson · 2018-01-05T16:00:41Z

Looking over the changes again, I think I'd like this a lot better if rangestart were only called in cases where the argument is known to be a range. In other cases we should still call first and let the error happen.

On the triage call @mbauman suggested changing the functions with f(::Union{Int,Range}) to

f(i::Int) = f(i:i)
f(::Range) = # old code

That would also be an improvement.

One interesting case is getting a pointer to the "first element" of an empty array. Using first in the pointer function would give an error, which might be either useful or annoying depending on your perspective. Maybe it should return C_NULL in that case? That might slightly qualify as a bug found by this PR.

nalimilan · 2018-01-05T16:21:44Z

Looking over the changes again, I think I'd like this a lot better if rangestart were only called in cases where the argument is known to be a range. In other cases we should still call first and let the error happen.

The problem with that is that the majority of the uses are with linearindices or eachindex, which as you noted are not guaranteed to return ranges right now. And in many cases the code is designed so that even if the range is empty, it works because the comparisons between start and stop allow skipping the parts which would throw errors. It would be OK if we decided that objects returned by linearindices and eachindex have to implement rangestart/rangestop.

On the triage call @mbauman suggested changing the functions with f(::Union{Int,Range}) to

f(i::Int) = f(i:i)
f(::Range) = # old code

That would also be an improvement.

I've tried that, but it doesn't always work. For example, in splice! the returned value for integer arguments is a scalar, but for range arguments it's an array. I can try to apply this strategy to more places though (in particular the FIXMEs need to be addressed in one way or another.)

vtjnash · 2023-10-27T21:44:38Z

#22354 is closed

nalimilan added 2 commits January 4, 2018 12:24

Deprecate first() and last() on empty ranges

cd6b21d

This will allow throwing an error, for consistency with other AbstractArrays. Introduce the rangestart() and rangestop() functions instead.

Remove rangestart(x) = first(x) and rangestop(x) = last(x) fallbacks

2c29e08

nalimilan force-pushed the nl/firstlast branch from de4d005 to 2c29e08 Compare January 4, 2018 11:24

nalimilan mentioned this pull request Jan 4, 2018

Range first and last can be misleading #22354

Closed

JeffBezanson reviewed Jan 5, 2018

View reviewed changes

nalimilan mentioned this pull request Jan 15, 2018

Change findfirst/findlast/findnext/findprev to return the same index type as keys() #25577

Merged

nalimilan mentioned this pull request Jan 25, 2018

RFC: Add Stateful iterator wrapper #25731

Merged

goretkin mentioned this pull request Aug 6, 2020

split Base.lastindex semantics, avoid sentinel value #34697

Closed

vtjnash closed this Oct 27, 2023

vtjnash deleted the nl/firstlast branch October 27, 2023 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Deprecate first() and last() on empty ranges #25385

RFC: Deprecate first() and last() on empty ranges #25385

nalimilan commented Jan 4, 2018

JeffBezanson commented Jan 4, 2018

JeffBezanson commented Jan 4, 2018

StefanKarpinski commented Jan 4, 2018 •

edited

Loading

JeffBezanson commented Jan 4, 2018

StefanKarpinski commented Jan 4, 2018

nalimilan commented Jan 5, 2018

JeffBezanson commented Jan 5, 2018

JeffBezanson Jan 5, 2018

JeffBezanson commented Jan 5, 2018

nalimilan commented Jan 5, 2018

vtjnash commented Oct 27, 2023

RFC: Deprecate first() and last() on empty ranges #25385

RFC: Deprecate first() and last() on empty ranges #25385

Conversation

nalimilan commented Jan 4, 2018

JeffBezanson commented Jan 4, 2018

JeffBezanson commented Jan 4, 2018

StefanKarpinski commented Jan 4, 2018 • edited Loading

JeffBezanson commented Jan 4, 2018

StefanKarpinski commented Jan 4, 2018

nalimilan commented Jan 5, 2018

JeffBezanson commented Jan 5, 2018

JeffBezanson Jan 5, 2018

Choose a reason for hiding this comment

JeffBezanson commented Jan 5, 2018

nalimilan commented Jan 5, 2018

vtjnash commented Oct 27, 2023

StefanKarpinski commented Jan 4, 2018 •

edited

Loading