-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the rangepocalypse #5585
Comments
what a wide range of problems. |
We have a totally coherent story: ranges are a more compact representation of a certain class of dense vectors. The only real problem is that the size of a dense vector is obviously limited by The tricky part is factoring and naming the types. We can probably keep |
What they represent has never been an issue. How they are represented and implemented, however, is clearly a problem or there wouldn't be ten billion issues related to ranges. I think the way this needs to work is that we separate out "ordinal ranges" from "linspace ranges". The "ordinal ranges" include integer, bigint, character, etc. ranges, and store the start, stop and step for those. Since using a step of 1 is so common, that can be factored out as its own case. The types should probably look like this: abstract Ranges{T}
abstract OrdinalRange{T,S<:Integer} <: Ranges{T}
immutable UnitRange{T} <: OrdinalRange{T,Int}
start::T
stop::T
end
immutable StepRange{T,S<:Integer} <: OrdinalRange{T,S}
start::T
stop::T
step::S
end When you write The "linspace ranges" need a different internal representation: start, stop and length. Each value is computed by taking a linear combination of the start and the stop. For the greatest accuracy, we might even want to allow downscaling: i.e. start, stop, length, and divisor, and ref is computed something like this: function ref(r::LinearRange, k::Int)
k -= 1; (0 <= k <= r.length) || error(BoundsError)
((r.length-k)*r.start + k*r.stop)/r.divisor
end The type should probably look like this: immutable LinearRange{T,S<:Integer} <: Ranges{T}
start::T
stop::T
length::S
divisor::T
end I'm not sure about the divisor part since that's only necessary to improve a few very nasty corner cases. It would be nice to reclaim |
Can I vote against setting the |
Yes, that's entirely fair and making sure this works well with dates and times a crucial design consideration. What kinds of constructions do you need / want for dates and times? Can you list them with examples here? |
I generally agree with @StefanKarpinski's proposal. Just have to emphasize that |
The type of the |
Will the set of UnitRange start and stop be restricted to those who have a (Int-) length? |
My current idea there would be to allow any start and stop, but compute the length with overflow checking. |
@mschauer, no, that's the whole point – the length representation is lousy for ordinal range types. This way you don't need to pick a length type since it's not needed for iterating the object. Computing the length and indexing into it need some thought. For length, I think that we could just do this: length(r::OrdinalRange) = r.stop - r.start + 1 |
Computing the length with overflow checking is a good idea. |
The divisor representation has the problem that the intermediate result |
It's been a while since I've attempted interacting directly with BLAS, but I was under the impression that for some functions, at least, one could provide a stride for both |
BLAS level 2 & 3 functions only work with matrices with (contiguous rank 1), i.e. each column needs to be contiguous. But this is not related to the discussion here though. |
I don't think there's much that's particularly fancy about date ranges. Something along the following # Creates Range{Date} with default step of Day(1)
daterange = Date(2013,1,1):Date(2014,1,1)
# Date range with step of Week(1) from 2013-01-01 to 2014-01-01
daterange = Date(2013,1,1):Week(1):Date(2014,1,1)
# Datetime works similarly, but can also have TimePeriod steps
# Datetime range for every second of the day 2014-01-28
datetimerange = Datetime(2014,1,28):Second(1):Datetime(2014,1,29) I think it's pretty simple and straightforward and provides the expected functionality. I originally thought of trying to incorporate the @milktrader may have more ideas on anything else that we may want consider with date ranges specifically. |
I don't think they're crazy exotic or anything, but they are just exotic enough that they make a great example that we need to make sure we can support. It would be nice not to have to create a new, separate set of range types every time someone wants to make something rangelike with their own custom types. |
Are we already supporting multidimensional Ranges ? Range in cartesian space could be useful, representin e.g. AABBs. At least it should be easy to adapt behaving like Range. |
@EyeOfPython There is an NDRange example included with Julia, although it's broken and could be improved considerably with the new language features introduced since it was last updated 2 years ago. |
NDRange could be a great idea, but best discussed in a separate thread. The pressing problem here is to resolve the issues related to current 1D range systems as listed above. |
I started prototyping the candidate new
That is kind of crazy, but amazingly enough LLVM is able to optimize everything away and we get loops that are sometimes even faster than before. |
This seems to be CPU-dependent. On some machines that implementation is indeed significantly slower. |
Man, that's a real bummer. I considered starting the state at the value before the first one:
Of course, that considers |
This is a problem for 32-bit ranges, but on a 64-bit system I think you'd be dead before you could finish iterating over |
What about we simply reject (i.e. throw an error with clear message) upon the construction of Why should we need to support a use case which is almost never used in real practice when supporting it requires disproportionate amount of efforts? |
I would second @karbarcca that Datetime ranges are sufficient as currently implemented. I haven't run into any issues yet. julia> dates[1:2]
2-element Array{Date{ISOCalendar},1}:
1980-01-03
1980-01-04
julia> length(dates)
505
julia> tuesdays = x->dayofweek(x)==Tuesday
(anonymous function)
julia> tue = dates[1]:tuesdays:dates[505]
1980-01-08:(anonymous function):1981-12-29
julia> typeof(tue)
DateRange1{ISOCalendar} (constructor with 1 method)
julia> tue.step
(anonymous function)
julia> length(tue)
104
julia> everythirdweek = [tue][1]:weeks(3):[tue][104]
1980-01-08:3 weeks:1981-12-22
julia> length(everythirdweek)
35
julia> [everythirdweek][1:2]
2-element Array{Date{ISOCalendar},1}:
1980-01-08
1980-01-29 added note: the julia> dates = [date(1980,1,1):days(1):date(1981,12,31)]; |
That's not actually what I interpreted @karbarcca as saying. The date/time range functionality is there but it's a shares almost no code with Base's range implementations. You shouldn't need to rewrite all the tricky functionality and logic for ranges because you have a new type of thing that you want to make ranges of. That's a complete failure of generic programming and composability. |
Aha, I figured I was missing some important point. |
See #5636 – a mind-reading float-range implementation. |
Take a look at my latest attempt. Not done yet, but the basic type definitions can be examined and debated. |
Yeah, I've been following that – looks pretty good. How do you feel about merging soon? |
I'd like to as soon as we can get the rest of ranges.jl implemented. Division is a minor wrinkle --- datetimes and periods support |
A big question is what to do with Also maybe |
I rather like |
I looked at a few other libraries and languages, and people tend to either accept a length argument, or accept a value one beyond the last actual value. This cleverly sidesteps the |
An interesting option would be to use the lifted |
So, I tried this: start(r::Range1{Int,Int}) = uint(0)
next(r::Range1{Int,Int}, i) = (int(r.start+i), box(Uint64,add_int_nuw(unbox(Uint64,i),unbox(Uint64,1))))
done(r::Range1{Int,Int}, i) = (r.sentinel-1 < r.start) | (i > (uint(r.sentinel)-1)-r.start) where |
Unfortunately this only seems to partly work with the loop vectorizer. The first function I define in a session gets vectorized, but not subsequent functions. There is no problem with the old iterator. |
This is quite interesting. That vectorizer behavior sounds like a bug; I don't see why it should be stateful like that. |
Another idea I want to discuss: in some cases it is much more convenient to construct a range by length. We could add a function |
I like that idea a lot. Those are the two standard ways to do this. One wrinkle is that while Base.showcompact(io::IO, x::Float64) = show(io,x)
julia> [0.3:0.1:1.1]
9-element Array{Float64,1}:
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1 while julia> linspace(0.3,1.1,9)
9-element Array{Float64,1}:
0.3
0.4
0.5
0.6000000000000001
0.7000000000000001
0.8
0.9
1.0000000000000002
1.1 This is probably an argument for |
I realized I was benchmarking with a function where I wasn't actually using the iteration variable. If I use the iteration variable, then I get an extra add in the loop (two extra adds for summing the yielded values) and this is much slower, so it's not really viable. |
Ah, that makes sense. Tricksy compilers. |
I know @JeffBezanson hates umbrella issues, but there are so many range-related problems cropping up that I wanted to have a single place to consolidate the various issues. Here are some of the relevant issues:
rand(typemin(Int32):typemax(Int32))
– Problem with rand(typemin(Int32):typemax(Int32)) #5550 and Fix overflow in #5550 #5555Less immediate, but related:
Uint8
ranges – make [a, b] not concatenate #3737 (edit: more aboutcat
than ranges)last(t::Range)
Fix for last{T}(t::Range{T}) #2734 (closed)Uint8
range – Infinite loop for Uint8 range #5483 (closed)We need a more coherent story for ranges. I suspect that ranges of ordinal types, including integers and chars need to be handled rather differently from floating-point ranges and such.
The text was updated successfully, but these errors were encountered: