Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Deprecate first() and last() on empty ranges #25385

Closed
wants to merge 2 commits into from
Closed

Conversation

nalimilan
Copy link
Member

This will allow throwing an error, for consistency with other AbstractArrays.
Introduce the rangestart and rangestop functions instead.

Fixes #22354.

This is RFC because two issues need to be discussed (apart from missing docs and FIXME):

  • It would be nice to use the new field overloading feature to write r.start and r.stop instead of adding rangestart and rangestop functions. I think this use for ranges has been considered when discussing the feature, but since that would be one of the first times it would be used in Base, I'd rather check that people support it before adopting that approach.
  • It turns out that in a few cases it would be useful to have rangestart and rangestop accept integers and return them, just like first and last do. See the second commit to have a list of the few places where it's needed. Unfortunately, that doesn't play well with using field overloading, except if we are OK with things like 1.start returning 1.

This will allow throwing an error, for consistency with other AbstractArrays.
Introduce the rangestart() and rangestop() functions instead.
@JeffBezanson
Copy link
Member

A definition like first(a::AbstractArray) = a[rangestart(eachindex(a))] worries me. Here we know that eachindex(a) is iterable, but we don't know whether it's a range, so there's no way to know you need to call rangestart.

@JeffBezanson
Copy link
Member

Addendum: in that particular case it should probably still call first, since there will be a BoundsError either way.

I still don't love this change though. It seems like we're saying, "if you call first on a range and get errors, call rangestart instead". Seems to me we should just do that for you.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 4, 2018

This might be a good place for properties: make r.start, r.stop and r.step work as expected for all range types and have r.start and r.stop be the way to get what used to be first(r) and last(r). After all, if you know you're working with a collection that can be empty but still have a notion of a starting and stopping point, that basically implies that you're working with a range.

In other words my proposal is very similar to this change but you write r.start instead of rangestart(r) and r.stop instead of rangestop(r); r.step doesn't have an analogue here, but it would be a more consistent way to spell step(r) after such a change.

@JeffBezanson
Copy link
Member

That was addressed in the OP.

@StefanKarpinski
Copy link
Member

Question from triage: were any bugs revealed by this PR?

@nalimilan
Copy link
Member Author

Question from triage: were any bugs revealed by this PR?

@StefanKarpinski I haven't found bugs. Though the kind of bugs this change would uncover are cases where first/last is called on an empty range, resulting in treating it as non-empty. This would only be caught by the PR if tests do pass empty ranges to functions which were not designed with ranges in mind.

However, the PR highlights that the special behavior of first/last on ranges extends to other similar types:

julia> first(CartesianIndices(1:0, 3:0))
CartesianIndex(1, 3)

julia> collect(CartesianIndices(1:0, 3:0))
0×0 Array{CartesianIndex{2},2}

See also below for a possible issue related to eachindex.

A definition like first(a::AbstractArray) = a[rangestart(eachindex(a))] worries me. Here we know that eachindex(a) is iterable, but we don't know whether it's a range, so there's no way to know you need to call rangestart.
Addendum: in that particular case it should probably still call first, since there will be a BoundsError either way.

@JeffBezanson Using first would throw an error about accessing the first index of OneTo(0) (for Array), which is much less user-friendly than printing an error about trying to access the array at index 1.

You're right that this pattern wouldn't work if eachindex returns something which doesn't implement rangestart. But currently it's not great either as a lot of code calls first/last on indices, and if these are not ranges then an error could be thrown when they are empty. The PR just makes this more visible.

I still don't love this change though. It seems like we're saying, "if you call first on a range and get errors, call rangestart instead". Seems to me we should just do that for you.

I have to admit it's not the sexiest PR ever. OTOH it's really weird to break the AbstractArray interface for fundamental types like ranges. What the PR shows is that when you're interested in the starting and ending values of a range even if it's empty, you don't actually want the first and last elements of an array: you want either these elements if they exist, or a pair in which the start is greater than the end, so that comparison checks automatically skip all operations. So maybe we could define rangestart/rangestop on empty arrays to return respectively 1 and 0: that would make code more generic than currently.

@JeffBezanson
Copy link
Member

You're right that this pattern wouldn't work if eachindex returns something which doesn't implement rangestart. But currently it's not great either as a lot of code calls first/last on indices, and if these are not ranges then an error could be thrown when they are empty. The PR just makes this more visible.

These aren't equivalent situations. Currently, the issue is that first on an empty array might throw an error from two different places (either first(eachindex(a)) or getindex). That is an extremely minor issue. But by switching to rangestart, we are asserting that array index objects must be ranges, and/or spreading confusion about when to call rangestart instead of first. If rangestart had fallback definitions to make it work with more types, it would be even less clear.

@@ -682,7 +682,7 @@ copyto!(dest::AbstractArray, src::AbstractArray) =

function copyto!(::IndexStyle, dest::AbstractArray, ::IndexStyle, src::AbstractArray)
destinds, srcinds = linearindices(dest), linearindices(src)
isempty(srcinds) || (first(srcinds) ∈ destinds && last(srcinds) ∈ destinds) ||
isempty(srcinds) || (rangestart(srcinds) ∈ destinds && rangestop(srcinds) ∈ destinds) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is an isempty check, I think this case can be left alone.

@JeffBezanson
Copy link
Member

Looking over the changes again, I think I'd like this a lot better if rangestart were only called in cases where the argument is known to be a range. In other cases we should still call first and let the error happen.

On the triage call @mbauman suggested changing the functions with f(::Union{Int,Range}) to

f(i::Int) = f(i:i)
f(::Range) = # old code

That would also be an improvement.

One interesting case is getting a pointer to the "first element" of an empty array. Using first in the pointer function would give an error, which might be either useful or annoying depending on your perspective. Maybe it should return C_NULL in that case? That might slightly qualify as a bug found by this PR.

@nalimilan
Copy link
Member Author

Looking over the changes again, I think I'd like this a lot better if rangestart were only called in cases where the argument is known to be a range. In other cases we should still call first and let the error happen.

The problem with that is that the majority of the uses are with linearindices or eachindex, which as you noted are not guaranteed to return ranges right now. And in many cases the code is designed so that even if the range is empty, it works because the comparisons between start and stop allow skipping the parts which would throw errors. It would be OK if we decided that objects returned by linearindices and eachindex have to implement rangestart/rangestop.

On the triage call @mbauman suggested changing the functions with f(::Union{Int,Range}) to

f(i::Int) = f(i:i)
f(::Range) = # old code

That would also be an improvement.

I've tried that, but it doesn't always work. For example, in splice! the returned value for integer arguments is a scalar, but for range arguments it's an array. I can try to apply this strategy to more places though (in particular the FIXMEs need to be addressed in one way or another.)

@vtjnash
Copy link
Member

vtjnash commented Oct 27, 2023

#22354 is closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Range first and last can be misleading
4 participants