Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing a range by a range #7420

Closed
Jutho opened this issue Jun 26, 2014 · 29 comments
Closed

Indexing a range by a range #7420

Jutho opened this issue Jun 26, 2014 · 29 comments
Labels
bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version
Milestone

Comments

@Jutho
Copy link
Contributor

Jutho commented Jun 26, 2014

I don't think this is expected behaviour?

r=0:0.1:1
a=collect(r)
a[1:5] # returns Float64[0.0, 0.1, 0.2, 0.3, 0.4]
r[1:5] # returns 0.0:0.01:0.04 

From doing more tests, the behaviour is very weird. It looks as if for all r with step size <= 0.25, the starting point and step size are first being multiplied by the step size before the indexing happens; in particular the new step size is the square of the old one: e.g.

r=1:0.002:20
r[2:4] # returns 0.002004:4e-6:0.002012
@elextr
Copy link

elextr commented Jun 26, 2014

The first example is what I would have expected.

On Julia 0.2.1 the second example gives 1.002:0.002:1.006 as I would have expected.

Which version are you using?

@Jutho
Copy link
Contributor Author

Jutho commented Jun 26, 2014

In the first example, look carefully and compare a[1:5] with r[1:5], or better yet, collect(r[1:5]). They are not the same.

I am using one of the latest nightly releases, currently 0.3.0-prerelease+3882

@elextr
Copy link

elextr commented Jun 26, 2014

Oops, note to self, read carefully :)

Example 1 r[1:5] is however also correct on 0.2.1 so its a regression.

ivarne added a commit to ivarne/julia that referenced this issue Jun 26, 2014
When indexing a FloatRange with a OrdinalRange, you have to be careful to use the actual r.step, instead of step(r) = r.step/r.divisor.
@ivarne
Copy link
Member

ivarne commented Jun 26, 2014

I think this was an easy fix (see: #7421)

@simonster simonster added this to the 0.3 milestone Jun 26, 2014
@ivarne
Copy link
Member

ivarne commented Jul 1, 2014

@StefanKarpinski There are more occurrences of similar errors:

julia> a = 0.1:0.1:1
julia> [reverse(a)] == reverse([a])
false

Culprint in the use of last in https://github.com/JuliaLang/julia/blob/master/base/range.jl#L498

ivarne added a commit to ivarne/julia that referenced this issue Jul 1, 2014
ivarne added a commit to ivarne/julia that referenced this issue Jul 1, 2014
Related to JuliaLang#7420

Added test for reverse(r::FloatRange)
@ivarne ivarne reopened this Jul 2, 2014
@ivarne
Copy link
Member

ivarne commented Jul 2, 2014

I'm using my newly acquired skills to Reopen this issue with reference to
https://groups.google.com/forum/#!topic/julia-users/efewThIZqB4 where it is noted that the new algorithm for getindex(r::FloatRange{T<:FloatingPoint},s::OrdinalRange{T,S}) could be implemented better if it tried to make first and last equal, instead of first and step.

x = linrange(1,10,20)
(x[2:end][end] == 10.0) == false

Thanks to @simonp0420 for noting my typo.

@ivarne
Copy link
Member

ivarne commented Jul 2, 2014

I would have hoped that linrange, could do its magic to hit the end points, but unfortunately it fails tests because it misses the middle values in some cases.

function getindex(r::FloatRange, s::OrdinalRange)
    if isempty(s)
        return FloatRange(1.0, 1.0, 0.0, 1.0)
    end
    1 <= first(s) <= length(r) &&
                  1 <=  last(s) <= length(r) || throw(BoundsError())
    linrange(r[first(s)], r[last(s)], length(s))
end

Maybe indexing of a FloatRange should just return an array?

@simonp0420
Copy link
Contributor

Would it make any sense for FloatRange objects to store the endpoints and length, rather than start, step, and length?

@ivarne
Copy link
Member

ivarne commented Jul 2, 2014

No
How would you calculate element 5 from start, end and length efficiently without caching the value of step? It would also not solve the problem that you would hit different intermediary points when you take a slice.

@simonp0420
Copy link
Contributor

Agreed, that would be a very bad idea. Nevertheless, it seems that Julia is giving the initial value of a FloatRange a very distinguished role, since its value is preserved exactly. It seems to me that the final value should be equally distinguished, with the understanding that the intermediate values might be approximated due to floating point issues. Would it make any sense to add the final point to the list of values stored in a FloatRange and to somehow provide this stored value whenever the final range value is requested? I suppose this would drastically degrade efficiency...

@pao
Copy link
Member

pao commented Jul 2, 2014

Background reading for FloatRange which explains some of the difficulty in getting it to behave the way you want it to: #2333

@StefanKarpinski
Copy link
Member

I'll take a look at this when I can.

@simonp0420
Copy link
Contributor

Thanks, @pao. A lot of thought and work has gone into ranges. I guess that introducing the new linrange function has added yet another wrinkle to consider.

@ivarne
Copy link
Member

ivarne commented Jul 2, 2014

Thinking more about this, we only three four real solutions to the slicing of a FloatRange problem.

  1. Add a field skip::Int to FloatRange that defaults to 1 so that we can change (i-1) * r.step to (i-r.skip) * r.step in getindex(r::FloatRange, i::Int). (EDIT: we will need two fields skip/offset and stride)
  2. Return an array from getindex(r::FloatRange, o::OrdinalRange)
  3. Create a wrapper type for getindex(r::FloatRange, o::OrdinalRange), to do a linear transformation on indexes.

Edited to add
4. Deprecate slicing of FloatRange, and suggest that people convert to array manually.

@Jutho
Copy link
Contributor Author

Jutho commented Jul 2, 2014

I am asking just out of interest; my apologies if this is not the right place. I know very little about the issues related to FloatingPoint precision. Why is it not possible to efficiently compute intermediate points from start, length and end?

is (start*(length-i) + (i-1)*end)/(length-1) that much more inefficient than (r.start + (i-1)*r.step)/r.divisor?

I guess one could also add two offset variables so that taking slices would just change the offset from the beginning and end so as to produce exactly the same values.

As stated at the beginning, these are not actual suggestions, since I am unexperienced with all the complications involved. Just asking out of curiosity...

On 02 Jul 2014, at 18:44, Ivar Nesje notifications@github.com wrote:

No
How would you calculate element 5 from start, end and length efficiently without caching the value of step? It would also not solve the problem that you would hit different intermediary points when you take a slice.


Reply to this email directly or view it on GitHub.

@pao
Copy link
Member

pao commented Jul 2, 2014

@Jutho I highly recommend reading the discussion in #2333, which I think answers your questions.

@simonster
Copy link
Member

It seems important that indexing a FloatRange by an OrdinalRange gives exactly the same values as indexing by the corresponding Vector{Int}, but returning a Vector seems a little unpleasant. Maybe we could return a SubArray? We could even define step for the returned type.

@ivarne
Copy link
Member

ivarne commented Jul 2, 2014

What would make SubArray better than an array?

@simonster
Copy link
Member

It wouldn't allocate memory, which is one of the main advantages of the FloatRange type. (It is basically the wrapper type that performs linear transformations on indices that you suggest in [3] above.) In principle it could also be faster to use. I'm not sure if that's true at present, but hopefully we'll get faster array views in 0.4.

@nalimilan
Copy link
Member

So that would be more a SubRange than a SubArray. Maybe a good solution.

@Jutho
Copy link
Contributor Author

Jutho commented Jul 3, 2014

Rather than making a new type for this, one could also add fields offset and stride to the FloatRange definition. A FloatRange constructed from the colon operator would have trivial values of offset=0 and stride=1. I don't know if there is a performance hit in having to add zero and multiply by one?

@ivarne
Copy link
Member

ivarne commented Jul 3, 2014

@Jutho That was what I intended to suggest in [1], but i thought we could just store step = stride*old_step. Because floating point multiplication is not associative, (i*stride)*step is not always the same as i*(stride*step). We need the first form to get the exact values from the original Range, thus we need to store stride as well.

@quinnj
Copy link
Member

quinnj commented Jul 3, 2014

Is there anything really 0.3 blocking here? I think the enhancements to FloatRange can be further explored in 0.4.

@ivarne
Copy link
Member

ivarne commented Jul 3, 2014

I don't think it is blocking 0.3-rc1 at least.

If consensus / benchmarks takes time, I think that we should just return an array for now, and explore further in 0.4

@quinnj quinnj modified the milestones: 0.4, 0.3 Jul 3, 2014
@StefanKarpinski
Copy link
Member

The issue here is that linrange doesn't construct floating point ranges correctly. This is the correct construction for linrange(1,10,20): FloatRange{Float64}(19,9,20,19). With this construction, everything goes through:

julia> r = FloatRange{Float64}(19,9,20,19)
1.0:0.47368421052631576:10.0

julia> r[end]
10.0

julia> r[2:end]
1.4736842105263157:0.47368421052631576:10.0

julia> r[2:end][end]
10.0

Now I just need to work out how to make sure that's the construction that happens.

@JeffBezanson
Copy link
Member

Stefan and I discussed that linrange really ought to construct a range that uses the expected linear interpolation formula start*(n-i)/n + stop*(i/n). Probably nothing else can work.

@simonp0420
Copy link
Contributor

That type of formula has a compelling symmetry. But isn't it start*((n-i)/(n-1)) + stop*((i-1)/(n-1)) for i = 1:n for a linrange(start, stop, n)?

@mbauman
Copy link
Member

mbauman commented Jun 12, 2015

All the examples in this issue seem to be working now, due to #9666 and its follow-on work. Is there anything left to do here?

@JeffBezanson
Copy link
Member

Bump. Yes, looks like everything here is working now. Plz reopen if not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests