Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added accumulate, accumulate! #18931

Merged
merged 7 commits into from
Nov 23, 2016
Merged

Added accumulate, accumulate! #18931

merged 7 commits into from
Nov 23, 2016

Conversation

jw3126
Copy link
Contributor

@jw3126 jw3126 commented Oct 14, 2016

Added accumulate, accumulate! see #14730.

@StefanKarpinski
Copy link
Sponsor Member

If anything, I feel like we should deprecate cumin and cumax in favor of scan(min, ...) and scan(max, ...) now that the latter has no performance down side. cumsum and cumprod are a bit different since sum and prod don't do exactly what a plain reduce would do and it may make sense for cumsum and cumprod to mirror what their non-cumulative counterparts do.

@jw3126
Copy link
Contributor Author

jw3126 commented Oct 14, 2016

I would keep cummin and cummax. I find that scan is a terribly unintuitive name (albeit standard) and think cummin and cummax are easier to discover.

What do you mean, when you say that sum and prod are not exactly a reduce?

@nalimilan
Copy link
Member

Indeed, where does the name choice come from? I did a bit a googling, and I couldn't easily find another example of scan used for cumulative operations. In many statistical packages, scan is used to parse a string.

@StefanKarpinski
Copy link
Sponsor Member

It's the standard name for the operation: https://en.wikipedia.org/wiki/Prefix_sum.

@StefanKarpinski
Copy link
Sponsor Member

Although I agree that it's a bit unfortunate. Perhaps we can pick a better name?

@@ -144,6 +144,8 @@ end
@deprecate chol(A::Number, ::Type{Val{:L}}) ctranspose(chol(A))
@deprecate chol(A::AbstractMatrix, ::Type{Val{:L}}) ctranspose(chol(A))

@deprecate cumop! scan!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cumop! wasn't exported, was it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, is the policy that only exported functions get deprecated? Should I remove this line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth checking whether it's widely used in packages. if not, then maybe we can skip the deprecation. mainly the issue is the @deprecate macro exports the old deprecated name, so isn't quite the right tool for deprecating something that was not originally exported

@nalimilan
Copy link
Member

Thanks for the link, it's funny it didn't show up. Personally, anything sounding like "cumulative" would be clearer for me, including cumop. But maybe we should have a look at what other languages do.

@StefanKarpinski
Copy link
Sponsor Member

I would be happy with cumulative and cumulative!. It's crystal clear.

@ararslan
Copy link
Member

ararslan commented Oct 15, 2016

💯 for cumulative. When I first read the PR I thought scan was supposed to be like R's scan, which gets input from STDIN. It's a pretty unintuitive name, and I think cumulative is the perfect replacement.

Edit: accumulate (see below) is even better

@Sacha0
Copy link
Member

Sacha0 commented Oct 15, 2016

accumulate (verb) might be more consistent with the names of similar higher-order functions (e.g. reduce)?

@jw3126
Copy link
Contributor Author

jw3126 commented Oct 15, 2016

Using a verb is a great idea. I think I will go with cumulate, which is slightly shorter then accumulate and closer to the word cumulative.

@ararslan
Copy link
Member

I actually didn't know that "cumulate" was a word until I looked it up just now. It might be nice to stick with a more common word, at least IMHO.

@KristofferC
Copy link
Sponsor Member

As another data point. For me, a non native speaker, accumulate is a lot clearer.

@jw3126
Copy link
Contributor Author

jw3126 commented Oct 15, 2016

I must confess that I was also not 100% sure whether cumulate was actually a word and looked it up to be safe 😄 . So maybe accumulate is indeed better.

@Sacha0
Copy link
Member

Sacha0 commented Oct 15, 2016

ngramaccumulate

:)

@KristofferC
Copy link
Sponsor Member

KristofferC commented Oct 15, 2016

If it was 1680 we might have had to bike shed this for a while :P

@timholy
Copy link
Sponsor Member

timholy commented Oct 15, 2016

I really like accumulate.

Speaking of renaming and functions that carry out this type of task, do we want to rename the internal function r_promote as something a little more...memorable? reusable? How about accumulate_eltype? I'm not sure we need to export it, but the choice of an element type for reductions/accumulations is a pretty common and fundamental task. I would be surprised if there aren't at least a half dozen semi-incompatible implementations of similar logic in packages. Perhaps it makes sense to have one in Base that can be exploited.

@jw3126
Copy link
Contributor Author

jw3126 commented Oct 15, 2016

I wonder should we have a left and a right accumulate_eltype?

@jw3126 jw3126 changed the title Added scan, scan! Added accumulate, accumulate! Oct 16, 2016
@StefanKarpinski
Copy link
Sponsor Member

+1 on accumulate – it's better than cumulative.

"""
accumulate!(op, B, A, dim=1)

Cumulative operation `op` on A along a dimension, storing the result in B. The dimension defaults to 1.
Copy link
Contributor

@tkelman tkelman Oct 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we usually code highlight the input names A, B etc ?

edit: looks like the cumprod! docstring does

@jw3126
Copy link
Contributor Author

jw3126 commented Oct 21, 2016

@timholy Turns out that there are already two such semi incompatible implementations in Base. Namely r_promote and rcum_promote_type. The issue is, that widening of numeric types gives precision, but costs memory. Now in case of reductions it does not matter, whether your result is 2, 4 or 8 bytes of memory, so small numeric types will be widened.
But in the case of accumulate this can make a difference and it was argued in #18364 that you probably don't want to widen automatically in that case.

I would still tend to merge r_promote and rcum_promote_type into a single accumulate_eltype, which does widening like r_promote. This way behaviour of reduce and accumulate would be more coherent. If you don't want widening however would have to use more verbose things like
cumsum!(similar(A), A)
cc @stevengj @tkelman

@tkelman
Copy link
Contributor

tkelman commented Oct 21, 2016

We don't widen Float32 to Float64 in reductions, do we? We widen integer types which gives more range but not more precision. I think for an operation that returns an array, widening should be something that the user has to ask for.

@jw3126
Copy link
Contributor Author

jw3126 commented Oct 22, 2016

Right it is more about overflow, though Float16 is also widened. If reduce and accumulate behave differently I guess it makes not sense to have accumulate_eltype?

@timholy
Copy link
Sponsor Member

timholy commented Oct 22, 2016

Good point, it's more an issue for reductions than for accumulations---manual specification makes sense for the latter.

($fmod)(res, A, R1, R2, axis)
end
function accumulate_pairwise!(op, result::AbstractVector, v::AbstractVector)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove blank line here

@@ -1905,4 +1908,8 @@ end
end
end

# exotic intexing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intexing -> indexing

return s_
end

function accumulate_pairwise!(op, result::AbstractVector, v::AbstractVector)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need function accumulate_pairwise!{F}(op::F, result::AbstractVector, v::AbstractVector) Ref: #18207

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I will test if there is a performance difference. Since each final call covers 128 elements of v, it might be okay.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get any results? Same should apply to the function below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only one allocation even without op::F, not sure why? I added the type annotation to _accumulate_pairwise! anyway however.

@stevengj
Copy link
Member

NumPy also calls a similar operation accumulate, so +1 for that name.

Regarding widening, my inclination is also to not automatically widen. The only corner case is for cumsum of Bool, which we widen to Int, and we opted to be conservative and preserve this behavior in #18364.

@@ -1868,3 +1868,61 @@ end
@test size(a) == size(b)
end
end

isdefined(:TestHelpers) || eval(Main, :(include(joinpath(dirname(@__FILE__), "TestHelpers.jl"))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted, the isdefined check is wrong.

@jw3126
Copy link
Contributor Author

jw3126 commented Nov 19, 2016

done

@stevengj
Copy link
Member

stevengj commented Nov 21, 2016

Looks good to merge. Actually, there should probably be a NEWS item for the accumulate function first.

@StefanKarpinski
Copy link
Sponsor Member

The last hurdle. Then we dine in... er, I mean then we merge this PR!

@stevengj stevengj removed the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label Nov 21, 2016
@@ -76,6 +78,7 @@ Deprecated or removed

* `Dates.recur` has been deprecated in favor of `filter` ([#19288])

* `cummin` and `cummax` have been deprecated in favor of `accumulate`.
Julia v0.5.0 Release Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a blank line

@@ -60,6 +60,8 @@ Library improvements
you can now do e.g. `[A I]` and it will concatenate an appropriately sized
identity matrix ([#19305]).

* New `accumulate` and `accumulate!` functions, which generalize `cumsum` and `cumprod`. Also known as [scan](https://en.wikipedia.org/wiki/Prefix_sum) ([#18931]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, also known as a scan operation

@stevengj
Copy link
Member

git rebase --whitespace=fix HEAD~1 to fix the trailing-whitespace appveyor failure.

@jw3126
Copy link
Contributor Author

jw3126 commented Nov 23, 2016

Anything left that prevents this PR from merge?

@StefanKarpinski StefanKarpinski merged commit 3daa183 into JuliaLang:master Nov 23, 2016
rcum_promote_type{T,N}(op, ::Type{Array{T,N}}) = Array{rcum_promote_type(op,T), N}

# accumulate_pairwise slightly slower then accumulate, but more numerically
# stable in certain situtations (e.g. sums).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here "situtations" has one too many t's

@@ -145,6 +145,10 @@ end
@deprecate chol(A::Number, ::Type{Val{:L}}) ctranspose(chol(A))
@deprecate chol(A::AbstractMatrix, ::Type{Val{:L}}) ctranspose(chol(A))

@deprecate cummin(A, dim=1) accumulate(min, A, dim=1)
@deprecate cummax(A, dim=1) accumulate(max, A, dim=1)
Copy link
Contributor

@tkelman tkelman Nov 25, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be at the end of the file where 0.6 deprecations go, not in the middle of the 0.5 deprecations

edit: got it in #19415

@coveralls
Copy link

Coverage Status

Coverage increased (+26.1%) to 73.121% when pulling 08fcdf0 on jw3126:scan into 7232a9a on JuliaLang:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.