Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cumsum fixes (fixes #18363 and #18336) #18364

Merged
merged 10 commits into from
Sep 8, 2016
Merged

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Sep 5, 2016

This fixes #18336 and #18363 — there were an embarrassing number of problems with cumsum and cumprod for corner cases. (My fault, I think, since I wrote this routine back in #4039.)

The breaking change is that cumsum(v) now uses similar(v, rcum_promote_type(+, eltype(v))) rather than the broken _cumsum_type(v) function, where rcum_promote_type is a new function that calls promote_op for numbers but which leaves most other types alone. I don't think this should change any working cases for cumsum of numeric types, but it might break some unusual user-defined types in which + produces a different type.

(The user can still control the result type manually by passing an array to cumsum!, however.)

@stevengj
Copy link
Member Author

stevengj commented Sep 5, 2016

Also fixes #13244.

@stevengj stevengj mentioned this pull request Sep 5, 2016
@stevengj
Copy link
Member Author

stevengj commented Sep 5, 2016

(I thought about using r_promote or similar to widen the cumsum result. However, widening makes more sense for a reduce operation, where you are only widening a single scalar result, than in a function like this where you are allocating a whole array. Also, the user can always widen manually if desired by passing the desired array type to cumsum!, unlike reduce.)

@stevengj
Copy link
Member Author

stevengj commented Sep 5, 2016

As @andreasnoack pointed out, this does not produce the expected result for Bool arrays, so maybe we should use something like r_promote after all.

I'm conflicted on whether we should widen a sum of, say UInt16, though.

@stevengj stevengj added the breaking This change will break code label Sep 5, 2016
@stevengj
Copy link
Member Author

stevengj commented Sep 5, 2016

Okay, updated it to use the same type as sum. It just seems easier to understand if we are consistent.

@@ -472,12 +469,16 @@ for (f, f!, fp, op) = ((:cumsum, :cumsum!, :cumsum_pairwise!, :+),
@eval function ($f!)(result::AbstractVector, v::AbstractVector)
n = length(v)
if n == 0; return result; end
($fp)(v, result, $(op==:+ ? :(zero(first(v))) : :(one(first(v)))), first(indices(v,1)), n)
li = linearindices(v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best would be to have n = length(li) since that's guaranteed to work even if v has non-1 indices.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't length(v) return the number of elements in the array regardless? Or is length(v) equivalent to last(linearindices(v))? The latter would seem odd to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that the length documentation says it returns "For ordered, indexable collections, the maximum index i for which getindex(collection, i) is valid."

Okay, changed it to length(li) as you suggest. It still seems strange to me.

(Note also that this cumsum method is only for AbstractVector, and the length documentation says that it "Returns the number of elements in A." for any AbstractArray.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://docs.julialang.org/en/latest/devdocs/offset-arrays/#background

Though if this isn't to be backported, then perhaps you could just stick with the original, since the situation with length is just for 0.5. However, packages probably haven't started preparing for 0.6 yet, so they may not support length yet for unconventional arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the definition/documentation for length should be revisited. It makes more sense to me for it to always return an element count, and have nothing to do with indices. The docs should say "returns the number of elements generated by an iterator".

@kshyatt kshyatt added the maths Mathematical functions label Sep 5, 2016
…onsistent with non-empty case for size mismatches
@StefanKarpinski
Copy link
Member

Slightly tangential but I really think we should probably make Bool
non-numeric like Char.

On Monday, September 5, 2016, Steven G. Johnson notifications@github.com
wrote:

As @andreasnoack https://github.com/andreasnoack pointed out, this does
not produce the expected result for Bool arrays, so maybe we should use
something like r_promote after all.

I'm conflicted on whether we should widen a sum of, say UInt16, though.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#18364 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJX_Pb7xDAqkRfWioqfIiN4dBg8RDGpks5qnGgbgaJpZM4J1QuX
.

@TotalVerb
Copy link
Contributor

TotalVerb commented Sep 5, 2016

I agree with @StefanKarpinski. There really ought to be a UInt1 type to model the integers mod 2 (a field that's small enough to promote upward to any other type) to replace the current Bool hackery.

I opened #18367 for that discussion.

@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

Seems like a random Travis timeout on OSX. I'll try restarting CI. (Is there a way to just restart Travis?)

@stevengj stevengj closed this Sep 6, 2016
@stevengj stevengj reopened this Sep 6, 2016
@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

Darn it, now Travis is succeeding but one of the AppVeyor builds is timing out. These random CI failures are really frustrating.

@stevengj stevengj closed this Sep 6, 2016
@stevengj stevengj reopened this Sep 6, 2016
@tkelman
Copy link
Contributor

tkelman commented Sep 6, 2016

You can restart specific CI services through their UI. I also request that you please make a backup of travis failure logs to a gist before restarting, as restarted builds there overwrite previous logs.

@tkelman
Copy link
Contributor

tkelman commented Sep 6, 2016

I'm conflicted on whether we should widen a sum of, say UInt16, though.

I'd much rather not widen UInt16 or UInt8 or Float16 arrays in cumsum, if I'm using an array of that type I am probably doing so because I want to reduce the memory usage. For reductions to a smaller output size this is less important (and the overflow prevention more worth the tradeoff) than something that returns an array of the same size as its input.

@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

@tkelman, okay, so it should just use typeof(zero(T)+zero(T)) for T<:Number?

@tkelman
Copy link
Contributor

tkelman commented Sep 6, 2016

I think that sounds sane. I'm not sure which corner cases the current typeof(+zero(T)) for Number and
typeof(v[1]+v[1]) otherwise are intended to address. There might be non-Number types where addition results in a different type, but those could be tricky to deal with.

@andreasnoack
Copy link
Member

typeof(zero(T)+zero(T)) is not correct. See #14237. Why not promote_op(+,T,S)?

@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

@tkelman, the main current corner case seems to be Bool, which gets promoted to Int for addition.

@andreasnoack, promote_op(+,T,S) seems fine, although promote_op seems like it might go away at some point.

@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

Hmm, no, it looks like promote_op is too conservative about falling back to Any. e.g. Base.promote_op(+, Vector{Int}, Vector{Int}) returns Any. I guess I could only use promote_op for T<:Number.

@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

Hmm, also need to handle cumsum of Vector{Vector{Bool}} and similar, grr.

@stevengj
Copy link
Member Author

stevengj commented Sep 6, 2016

Okay, the new behavior should be more consistent with the old cumsum in not widening numeric types (except Bool).

@stevengj
Copy link
Member Author

stevengj commented Sep 7, 2016

Better?

@@ -470,14 +476,18 @@ for (f, f!, fp, op) = ((:cumsum, :cumsum!, :cumsum_pairwise!, :+),
end

@eval function ($f!)(result::AbstractVector, v::AbstractVector)
n = length(v)
li = linearindices(v)
li != linearindices(result) && throw(BoundsError())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an DimensionMismatch and include a helpful message about the nature of the problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Test needs to change too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, right.

@timholy
Copy link
Member

timholy commented Sep 7, 2016

Aside from one small comment, LGTM.


# handle sums of Vector{Bool} and similar. it would be nice to handle
# any AbstractArray here, but it's not clear how that would be possible
rcum_promote_type{T,N}(op, ::Type{Array{T,N}}) = Array{rcum_promote_type(op,T), N}
Copy link
Contributor

@pabloferz pabloferz Sep 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do

rcum_promote_type{T}(op, ::Type{T}) = promote_eltype_op(op, T)
rcum_promote_type{T}(op, ::Type{Array{T,N}) = Array{rcum_promote_type(op, T), N}

without specializing for <:Number

Copy link
Member Author

@stevengj stevengj Sep 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promote_eltype_op gives the wrong answers for non-Number arguments too. e.g. it gives Base.promote_eltype_op(+, Range{Int}) --> Int when I want Range{Int}.

Copy link
Contributor

@pabloferz pabloferz Sep 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's why I left the second definition. Maybe some day, if we get triangular dispatch or something like it, that will be easier.

EDIT: I see that for Range that won't work unless we also have something like triangular dispatch.

@timholy timholy merged commit c7a4897 into JuliaLang:master Sep 8, 2016
@stevengj stevengj deleted the cumsum_fixes branch September 8, 2016 18:02
@StefanKarpinski
Copy link
Member

This is a (minor) breaking change, but the previous behavior could be argued to be buggy. Should we consider backporting this fix?

@stevengj
Copy link
Member Author

stevengj commented Sep 8, 2016

I would say the odds of this breaking someone's code is pretty low. About the only cases that should actually break would be cumsum of an array of non-Array collections of Bool, or cumsum of an array of some other weird non-Number type for which + produces a different type. I doubt there are any actual examples of such code in the wild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cumsum loses the sign of zero imaginary part
9 participants