Sequence generation linspace vs linrange #9637

bkamins · 2015-01-06T15:32:52Z

I think that Julia has excellent handling of sequences of floats (see the my blog for comparison with R and Python).

The only small issue is the following code:

println(linspace(0.1, 1.1, 6) .== [linrange(0.1, 1.1, 6)])

produces:

Bool[true,true,true,true,false,true]

So they are not equal and I would expect an identical result (the root cause is the difference of the results between colon notation and linspace).

It should be decided if this should be guaranteed.

The text was updated successfully, but these errors were encountered:

simonster · 2015-01-06T16:10:15Z

Some previous discussion in #7420, particularly @StefanKarpinski and @JeffBezanson's comments at the end.

bkamins · 2015-01-06T17:17:25Z

Just realized that Julia allows for testing with rational numbers. Here is what I got:

ref = convert(Vector{Float64},
              [linrange(convert(Rational{BigInt}, 0.1),
                        convert(Rational{BigInt}, 1.1), 6)])
print(convert(Vector{Float64}, [linrange(1//10, 11//10, 6)]) .== [0.1:0.2:1.1])
print(ref - [0.1:0.2:1.1])
print(ref - linspace(0.1, 1.1, 6))

gives

Bool[true,true,true,true,true,true]
[0.0,5.551115123125783e-17,0.0,1.1102230246251565e-16,0.0,0.0]
[0.0,0.0,0.0,0.0,-1.1102230246251565e-16,0.0]

This allows us to see what happens under exact calculations (and second and third example take into account that actually 0.2 and 1.1 are only approximated in Float64)

bkamins · 2015-01-06T21:20:02Z

Here is the proposal how linspace could be changed (I omit constraining argument types in the example):

function linspace2(start, stop, n)
    by = (stop - start) / (n - 1)
    r = start:by:stop
    return stop == last(r) ? [r] : linspace(start, stop, n)
end

Now following the linspace definition agreed in #2575: "Construct a vector of n linearly-spaced elements from start to stop." I compared linspace and linspace2:

srand(1)

function diffrange(x)
    dmin, dmax = extrema(diff(x))
    dmax - dmin
end

scale1 = 100.0
scale2 = 1000.0
scale3 = 10000
store = Array(Float64, 0)
for i = 1:10000
    a = scale1*(rand() - 0.5)
    b = scale2*(rand() - 0.5)
    n = 3 + int64(rand() * scale3)
    l1 = linspace(a, b, n)
    l2 = linspace2(a, b, n)
    @assert l1[end] == l2[end]
    res = diffrange(l2) - diffrange(l1)
    push!(store, res)
end
println(mean(store))
println(maximum(store))
println(minimum(store))
println(sum(store .< 0))
println(sum(store .== 0))
println(sum(store .> 0))

which produces:

-4.466507164124778e-14
4.263256414560601e-14
-1.7053025658242404e-13
7448
2498
54

which simply says that actually colon is able to produce more evenly spaced points than linspace. The result is the same if we change definition of diffrange to compute variance of differences or maximum absolute difference and seems robust to changing scale1, scale2 and scale3.

So linspace2 seems superior to linspace in terms of the objective (evenly spaced points) while additionally it has the benefit of guaranteeing that colon and linspace produce the same result when it is reasonable to expect so.

Additionally a question (I did not find it discussed):

Would it not be useful to make linspace produce an iterator not a vector? Often we do not need a vector and you can always convert an iterator to a vector if needed.

StefanKarpinski · 2015-01-06T22:06:39Z

The linspace function produces the straightforward linear intermediate points between its start and stop values. Since that's a pretty obvious, well-defined behavior – although, as it turns out, not always what you want – I'm hesitant to change it to anything else. The fact that our colon is better at producing what you want is due to "lifting" ranges by integer multiples that match the start, step and stop.

JeffBezanson · 2015-01-06T22:38:09Z

Yes, my understanding is that the purpose of linspace is to give the values generated by the formula start*(n-i)/n + stop*i/n. I don't know how often people want that, but if not often I'd just as soon remove the function.

@bkamins thanks for the nice writeup on your blog :)

bkamins · 2015-01-06T23:13:13Z

I am aware about "lifitng" and that is why I have proposed the change. But I agree that it is a minor one.

However, my understanding is that most often people need to generate sequences from start to stop that have n points as equally spaced as possible and do not think much how they are produced.

In fact after the discussion I see that the iterator thing is more important than identity between colon and linspace. It is often the case that you would want to:

for x = linspace(a, b, n)
    # do something with x in a loop
end

and in such cases you actually need only an iterator, and it is often easier to specify required size of the collection than step size.

I assume that linspace was borrowed from Python where it produces a vector because there vectorization is the only efficient way to perform computations. As you know best :) in Julia it is often better to devectorize for memory and speed reasons.

Just a simple example:

const n = 10^8+1
function test1()
    s = 0.0
    for x in 0.0:(1/(n-1)):1.0
        s += x
    end
end

function test2()
    s = 0.0
    for x in linspace(0.0, 1.0, n)
        s += x
    end
end

@time test1()
@time test1()

@time test2()
@time test2()

on my computer gives

elapsed time: 0.031019712 seconds (131112 bytes allocated)
elapsed time: 0.028756213 seconds (160 bytes allocated)
elapsed time: 1.055671265 seconds (800569544 bytes allocated)
elapsed time: 1.086587485 seconds (800000144 bytes allocated)

nalimilan · 2015-01-07T09:18:09Z

Maybe at least add a warning to the linspace docs stating that you'll generally prefer to use ranges?

StefanKarpinski · 2015-01-07T18:24:25Z

I took a crack at making linspace as good as our ranges. I don't actually think this will ever break anyone's expectations. It lifts the construction the same kind of way when possible, and otherwise falls back on the current "straightforward" construction.

StefanKarpinski · 2015-01-07T18:40:27Z

With this change, we would have

linspace(0.1, 1.1, 6) == [0.1:0.2:1.1]
linspace(0.1, 1.1, 6) != [linrange(0.1, 1.1, 6)]

So now linrange is the odd one out, giving

[0.1,0.30000000000000004,0.5,0.7000000000000001,0.9,1.1]

instead of [0.1,0.3,0.5,0.7,0.9,1.1]. Progress in any case.

bkamins · 2015-01-07T19:23:31Z

Well this is a possible approach to modify linrange:

function linrange2(a::Real, b::Real, len::Integer)
    if  len >= 2
        by = (b - a) / (len - 1)
        r = a:by:b
        return b == last(r) && r.len == len ? r : range(a, by, len)
    end
    len == 1 && a == b ? range(a, zero((b-a)/(len-1)), 1) :
                         error("invalid range length")
end

as it takes the advantage of "lifting" in standard colon notation (and we know the endpoint).
We have to check b == last(r) && r.len == len as sometimes it is impossible to choose any step value with given 'a' and 'b' to make colon notation generate sequence that has exactly 'len' elements and ends with 'b', e.g. 0.0:1.0:nextfloat(100.0) last is too small and 0.0:nextfloat(1.0):nextfloat(100.0) last is also too small and the sequence is too short.

Probably it is enough to check b == last(r) as it should imply r.len == len but it is safer to check and it is cheap.

ivarne · 2015-01-07T19:24:57Z

Looks like an improvement to me.

It's hard to find good test cases, but to suggest a good metric for the "quality" of linspace, I'd want var(diff(linspace(a,b,n))) to be as small as possible. This actually makes it smaller in one test case:

julia> var(diff(linspace2(0.1,1.1,11)))
2.139922160430262e-33

julia> var(diff(linspace(0.1,1.1,11)))
4.879022525780997e-33

bkamins · 2015-01-07T19:28:16Z

I have checked var on my random test and it is similar to difference between extrema in my proposed code.

StefanKarpinski · 2015-01-07T20:16:25Z

Minimizing var(diff(linspace(a,b,n))) is a nice side effect but I suspect you can contrive cases where it does not have that effect.

StefanKarpinski · 2015-01-07T20:18:16Z

Note that I added tests for this that generate and check a lot of cases, all of which pass. I didn't add a list of specific hard cases since my experience with the range work was that the generated tests are strictly harder to pass than the hand-crafted hard cases. Before merging this, I should probably test those too though.

StefanKarpinski · 2015-01-07T20:33:00Z

Ok, added the "handcrafted" test cases too. They all pass, which is unsurprising.

bkamins · 2015-01-08T07:46:26Z

Just an example of a handcrafted test case (I used unmodified definition of linspace for testing). I am not sure if it should be fixed but I give it for the reference:

m1 = nextfloat(-Inf)
m2 = prevfloat(Inf)    # m1 == -m2
linspace(m1, m2, 3)    # OK
[linrange(m1, m2, 3)]  # wrong
[m1:m2:m2]             # error

simonbyrne · 2015-01-08T16:48:32Z

Some of these issues did arise when I wrote linrange, e.g. see this comment (I ended up going with option 1).

Furthermore, in this case linrange is technically "more correct", in that it gives the correctly rounded interpolated values:

julia> for i in linspace(big(0.1),big(1.1),6)
       println(float64(i))
       end
0.1
0.30000000000000004
0.5
0.7000000000000001
0.9
1.1

julia> for i in linspace(0.1,1.1,6)
       println(i)
       end
0.1
0.30000000000000004
0.5
0.7000000000000001
0.9000000000000001
1.1

julia> for i in linrange(0.1,1.1,6)
       println(i)
       end
0.1
0.30000000000000004
0.5
0.7000000000000001
0.9
1.1

StefanKarpinski · 2015-01-08T17:51:17Z

@bkamins, man, that's a diabolical example, which we should, of course, still handle correctly.

StefanKarpinski · 2015-01-08T18:11:02Z

@simonbyrne, the new proposed linspace gives this result:

julia> for i in linspace(0.1,1.1,6)
       println(i)
       end
0.1
0.3
0.5
0.7
0.9
1.1

simonbyrne · 2015-01-08T18:36:32Z

Sorry, I should have clarified what I meant: since the Float64 0.1 is actually 0.1000000000000000055511151231257827021181583404541015625, and 1.1 is actually 1.100000000000000088817841970012523233890533447265625, then the point one-fifth of the way between 0.1 and 1.1 is actually 0.30000000000000002220446049250313080847263336181640625, for which the nearest Float64 is 0.30000000000000004, not 0.3.

So for this example, the current linrange behaviour is correct for all values, the old linspace is incorrect for 1 value (0.9), and the proposed linspace is incorrect for 2 values (0.3 and 0.7).

StefanKarpinski · 2015-01-08T19:26:12Z

Yes, that's true, but our current range behavior – which everyone seems to like – is also "wrong" by that standard.

bkamins · 2015-01-08T19:29:04Z

@StefanKarpinski, you asked for a handcrafted example so I gave one :). It seems that replacing (a-b)/n by a/n-b/n in range and colon should give exactly the same results in normal cases and solve the problem.

@simonbyrne, This is exactly how I understood it. That is why in my second post in this issue I used Rational{BigInt}. Let me give a simple example showing a more serious problem introduced by "lifting" in colon (in fact I assume that for this reason R introduces safety buffer of relative 1e-10; I do not like the R solution though):

println([0.1:0.1:0.3])
# [0.1,0.2,0.3]
println(map(float64,[big(0.1):big(0.1):big(0.3)]))
# [0.1,0.2]

StefanKarpinski · 2015-01-08T19:33:15Z

@bkamins, it's a tradeoff. I don't think you can have intuitive float range behavior and not have cases where it also depends on the precision of float type you're working with. Personally, I think that example is fairly acceptable. The real issue is actually that big(0.1) is a problem and should possibly be disallowed. If you use BigFloat("0.1") instead, it works as expected:

julia> println(map(float64,[BigFloat("0.1"):BigFloat("0.1"):BigFloat("0.3")]))
[0.1,0.2,0.3]

bkamins · 2015-01-08T19:44:38Z

@StefanKarpinski, Yes BigFloat("0.1") works as expected. But in general even if we disallow big(0.1) it is simply possible that the value of 0.1 is input not as a literal but as some variable. But this tradeoff is also in my opinion acceptable (I just wanted to give another example) as normally you have such numbers as 0.1 etc. provided as literals and expect the behavior Julia currently delivers.

And as for replacement of (a-b)/n by a/n-b/n I was wrong as for n>2 the first is more exact. So we there also have a tradeoff between good behavior in normal situation and ability to handle special cases (and I prefer better precision for normal cases).

StefanKarpinski · 2015-01-08T19:52:07Z

And as for replacement of (a-b)/n by a/n-b/n I was wrong as for n>2 the first is more exact.

That was my suspicion, although I hadn't gotten a chance to check it. We may be able to work around this somehow and get correct behavior in both cases.

StefanKarpinski · 2015-02-05T15:02:38Z

I think the way to go here is to create a LinSpace type that is the generator form and have linspace return an object of that type. Then we can delete linrange since producing a LinSpace object for this kind of thing is strictly better than producing a FloatRange object.

bkamins · 2015-02-05T16:04:07Z

How do you see that LinSpace type should be specified? How would it differ from FloatRange?

In general, as I mentioned earlier, I agree that returning generators should be preferable to returning arrays, as you can always run collect on a generator if needed.

StefanKarpinski · 2015-02-05T16:23:23Z

It would work by implementing the non-eager version of this:

https://github.com/JuliaLang/julia/pull/9666/files#diff-8cc03187983013adb308460f8365e1d0R241

When I've got some time, I'll implement it in that PR. It different in the non-exact case:

https://github.com/JuliaLang/julia/pull/9666/files#diff-8cc03187983013adb308460f8365e1d0R245

StefanKarpinski · 2015-03-25T17:40:24Z

I've updated my pull request and I could use some opinions. I actually think there may be a way to generally avoid overflow, which I can implement a bit later (basically scaling down and then back up by powers of 2 when values are too large).

bkamins · 2015-04-03T22:00:49Z

I have written the following simple test procedure

srand(1)
simsize = 10000
maxrange = 10
ref = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
counts = zeros(Int, maxrange)
for d = 1:maxrange, i = 1:simsize
    x = round(rand(), d)
    if linspace(x, round(x+1, d), 11) != round(x + ref, d)
        counts[d] += 1
    end
end
print(counts / simsize * 100)

The old linspace produces the following output:

[95.14,98.42,98.4,98.3,98.47,98.32,98.56,98.5,98.55000000000001,98.49]

And with the new linspace we get:

[0.0,0.0,0.0,0.0,0.0,0.0,12.68,82.59,95.73,98.22]

And I like the improvement 👍.

StefanKarpinski · 2015-04-06T01:37:17Z

I'm not entirely sure what that code is measuring. Is this some statistical count of accuracy?

bkamins · 2015-04-06T06:03:03Z

Assume we wanted to generate an 11 element sequence starting from some value x and ending with the value 1+x.
Assume that x is a hand picked value from the interval [0,1[ entered as a literal by the user and that this literal has d digits of precision. In line x = round(rand(), d) we pick it randomly. So x is now exactly what we would obtain if user would input a literal. Similarly round(x+1,d) is exactly what user would input as a literal for a stop of the sequence.

User can generate this 11 element sequence in two ways:

using linspace(x, round(x+1, d), 11)
by manually inputing all 11 literals, which is expressed by round(x + ref, d)

What I do is a comparison what is the probability that those two sequences are different when we change d. The printed value is this probability for d ranging from 1 to 10.

You can see that when d<7 new linspace is always exactly identical to the result obtained by manual input of 11 literals. The old linspace is almost never exact.

linspace: try to "lift" linspace the float ranges are [close #9637]

StefanKarpinski added a commit that referenced this issue Jan 7, 2015

linspace: try to "lift" linspace the float ranges are [close #9637]

8ab455f

StefanKarpinski added a commit that referenced this issue Jan 7, 2015

linspace: try to "lift" linspace the float ranges are [close #9637]

0c19ae2

StefanKarpinski added a commit that referenced this issue Jan 7, 2015

linspace: "lift" linspace the way float ranges are [close #9637]

480f552

JeffBezanson added this to the 0.4 milestone Feb 19, 2015

StefanKarpinski added a commit that referenced this issue Mar 25, 2015

linspace: "lift" linspace the way float ranges are [close #9637]

ef72da9

StefanKarpinski added a commit that referenced this issue Apr 3, 2015

linspace: "lift" linspace the way float ranges are [close #9637]

beeb5f4

StefanKarpinski added a commit that referenced this issue Apr 6, 2015

linspace: "lift" linspace the way float ranges are [close #9637]

372e8a6

StefanKarpinski closed this as completed in d6866be Apr 13, 2015

StefanKarpinski added a commit that referenced this issue Apr 13, 2015

Merge pull request #9666 from JuliaLang/sk/linspace

22a73f9

linspace: try to "lift" linspace the float ranges are [close #9637]

timholy mentioned this issue May 16, 2015

Missing operations for LinSpace. Fix #11049 #11233

Merged

c42f mentioned this issue Jan 2, 2016

linspace(...)[1] can differ from linspace start value #14420

Closed

timholy mentioned this issue Oct 3, 2016

Make LinSpace generic #18777

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence generation linspace vs linrange #9637

Sequence generation linspace vs linrange #9637

bkamins commented Jan 6, 2015

simonster commented Jan 6, 2015

bkamins commented Jan 6, 2015

bkamins commented Jan 6, 2015

StefanKarpinski commented Jan 6, 2015

JeffBezanson commented Jan 6, 2015

bkamins commented Jan 6, 2015

nalimilan commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

bkamins commented Jan 7, 2015

ivarne commented Jan 7, 2015

bkamins commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

bkamins commented Jan 8, 2015

simonbyrne commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

simonbyrne commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

bkamins commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

bkamins commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

StefanKarpinski commented Feb 5, 2015

bkamins commented Feb 5, 2015

StefanKarpinski commented Feb 5, 2015

StefanKarpinski commented Mar 25, 2015

bkamins commented Apr 3, 2015

StefanKarpinski commented Apr 6, 2015

bkamins commented Apr 6, 2015

Sequence generation linspace vs linrange #9637

Sequence generation linspace vs linrange #9637

Comments

bkamins commented Jan 6, 2015

simonster commented Jan 6, 2015

bkamins commented Jan 6, 2015

bkamins commented Jan 6, 2015

StefanKarpinski commented Jan 6, 2015

JeffBezanson commented Jan 6, 2015

bkamins commented Jan 6, 2015

nalimilan commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

bkamins commented Jan 7, 2015

ivarne commented Jan 7, 2015

bkamins commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

StefanKarpinski commented Jan 7, 2015

bkamins commented Jan 8, 2015

simonbyrne commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

simonbyrne commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

bkamins commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

bkamins commented Jan 8, 2015

StefanKarpinski commented Jan 8, 2015

StefanKarpinski commented Feb 5, 2015

bkamins commented Feb 5, 2015

StefanKarpinski commented Feb 5, 2015

StefanKarpinski commented Mar 25, 2015

bkamins commented Apr 3, 2015

StefanKarpinski commented Apr 6, 2015

bkamins commented Apr 6, 2015