RFC: Make `any` and `all` short-circuiting #11774

fcard · 2015-06-20T01:28:10Z

~~Doesn't apply for n < 16, but that shouldn't be a big problem, right? There are apparently optimizations for small n, so I decided not to mess with that.~~

Keno · 2015-06-20T01:41:19Z

I don't think this is correct. Wouldn't this make all mapreduce invocations short-circuit?

fcard · 2015-06-20T01:51:15Z

Only the ones where the reduce function is AndFun or OrFun. I suppose I could make it more specific.

Keno · 2015-06-20T01:54:45Z

Well, at the very least that change deserved additional discussion.

fcard · 2015-06-20T01:59:03Z

I see, sorry. It just seemed that this is what the function was supposed to be in the first place, but I guess I shouldn't jump to conclusions so hastily.

Keno · 2015-06-20T02:01:03Z

I think we need to determine what the semantics of mapreduce need to be. It is quite possible that the original function is incorrect in short circuiting.

fcard · 2015-06-20T02:16:56Z

Yes. If one wants to have side-effects with mapreduce and | or &, this wouldn't let them. I suppose this is incorrect. It's easy to have any and all separated from mapreduce, but I wish they could still benefit from it somehow. I could create new functors AndFunShort and OrFunShort that are specific to any and all. I could also try the isreduced approach, but from the silence I got I guess nobody is interested :P Maybe short-circuiting is just conceptually incompatible with mapreduce?

fcard · 2015-06-20T05:14:43Z

Ok, last try with "cleverness", if people don't like this I am just going with the easy route. Either way, after this I am taking a break to reassess my life.

fcard · 2015-06-20T15:46:08Z

I realized that the Bool parameter was because reduce(|, _) can be used for numbers and that shouldn't short-circuit ~~ever~~*. I could've made a AbstractArray{Number} specialization but the more changes I made the more I realized any/all should've just been its own thing. I also noticed that there are some other functions in reduce.jl that are separated from mapreduce and that made me feel more confident in making the simple version.

So here, straightforward version, as it should have been from the beginning. I will see if there are any other changes to be made and then squash. Thanks for your patience.

*actually, it can short-circuit sometimes, still it's good to have them separated. ...?

^{_{darn you lack of impulse control}}

sbromberger · 2015-06-20T16:35:23Z

@fcard

Doesn't apply for n < 16, but that shouldn't be a big problem, right?

This is a big problem. Consider a very expensive function (which is exactly the use case I describe in #11750 (comment)). I think the expectation would be "short circuit under every circumstance" or "short circuit under no circumstance", but not some hybrid that's dependent on the length of the itr.

fcard · 2015-06-20T16:38:39Z

@sbromberger Fixed it already when I separated it from mapreduce. I suppose that's another reason to have done that.

sbromberger · 2015-06-20T16:39:46Z

@fcard Ah, sorry. I read your initial comment without having reviewed the change. Thanks!

PS: I'm totally in favor of separating any/all from mapreduce.

fcard · 2015-06-20T16:45:16Z

@sbromberger No big deal, I realized later that it might have been important to keep it consistent, when n is small but the predicate is expensive, as per your example, or in the case that someone is depending in the short-circuit behavior for side-effects (like counting up to the first true or something).

sbromberger · 2015-06-20T17:05:20Z

I don't understand why the AppVeyor build is failing. Timeout issues?

fcard · 2015-06-20T17:22:36Z

I guess so. From here:

Are there any build time restrictions?

All plans have maximum build job execution time of 40 minutes.

tkelman · 2015-06-21T01:04:48Z

The worker that started on linalg1 just froze, ref #7942, happens occasionally. I restarted the build.

sbromberger · 2015-06-21T02:37:48Z

w00t - looks like tests passed. Thanks, @fcard - this looks awesome. :)

fcard · 2015-06-21T21:08:51Z

I was testing the speed of the new any and all and noticed that it was slower in a few cases, so I bought back some of the optimizations of the old version. Here are the times I got (gists):

Summarized (average times)
Expanded (max, min, result, averages)

Here is the code I used to get the results. It's a bit messy.
I also removed a bit of redundancy from the tests, leftovers from a previous version.

hayd · 2015-06-21T21:18:03Z

base/reduce.jl

+    i   = 1
+    len = length(itr)
+    while i <= len
+       @inbounds x = itr[i]


Is doing this as a while loop faster than for x in itr ??

Apparently. reduce.jl is full of tricks like these, and the tests I did seem to confirm this: https://gist.github.com/fcard/aca7b07d24ff24b01d4d#file-sc-any-all-perf-summary-jl-L43-L49

Only for vectors, though. Didn't do much for ranges or tuples.

That's crazy/seems really bad. :( To clarify: the only difference between V1 and V2 in this particular signature is the while/for ? Edit: Is this #11787?

Yeah, that's the main difference, the others were made mostly to accommodate it. About it being #11787, maybe? Maybe it's making it worse, but these kinds of optimizations have been on reduce.jl for a while, right?

I'd think @inbounds for x in itr should be equally fast. @inbounds is necessary for optimal performance due to #11350, but whether we should actually use it for non-Arrays is another question, since we'll segfault on user-defined types if done is incorrectly defined.

I will try changing to this and see if it works faster, although if #11787 is at effect it should obfuscate the results somewhat. Still wondering about the other optimizations in reduce.jl though, were they made before the compiler could generate better code for array for loops?

Yep, @inbounds makes it fast. I will keep it in the array definitions only, that segfault problem sounds scary (doesn't seem to make any difference for the other types anyway).

simonster · 2015-06-21T22:32:25Z

IMO mapreduce should short-circuit but mapfoldl/mapfoldr should not. mapreduce already leaves many things up to the implementation. We don't define the associativity and don't guarantee that the map function is called for each element of the array if they are non-unique to avoid having to invoke f on all the zeros in a sparse matrix. Perhaps we should just say that mapreduce assumes that f is "pure" modulo allocation, i.e., that invoking f on the same input produces outputs that are == and of the same type, and invoking f does not modify observable global state.

fcard · 2015-06-22T01:35:32Z

I addressed the feedback I received so far. There are some things being debated, so I am going to keep this as a WIP for now and wait until conclusions are made.

Another thing that I need to do is solve the problem I reached when trying to optimize any/all when using the identity function. It's not really necessary for this, so if it becomes too much trouble I will leave it for later, it's just that I find unintuitive that any(identity, itr) is 10x slower than any(itr).

nalimilan · 2015-06-22T09:27:19Z

+1 to @simonster's proposal. Also, I agree any and all should only return Bool.

ScottPJones · 2015-06-22T10:44:02Z

Ditto

fcard · 2015-06-23T22:17:34Z

Hello, I am mostly done implementing this, but I reached a stumbling block.
Is this a valid usage of mapreduce?

julia> mapreduce(x->(x == 1? true : x), |, 1:10)
15

Or using f::T -> Union{Bool, SomethingElse} in general. If we reach a true, how do we know if there are any nonboolean results later that might modify the final result? In the case of any or all, it's a safe assumption to make that the predicate should return a Bool, but with mapreduce, I'd expect the transformer function to be allowed to return any value that can be used with the operator, and |/& can use booleans, integers and arrays, all in the same operation.

(The current implementation actually sometimes assumes that predicates return booleans so this happens:

julia> mapreduce(Int, |, trues(15))
1

julia> mapreduce(Int, |, trues(16))
ERROR: TypeError: non-boolean (Int64) used in boolean context
 in mapreduce_impl at reduce.jl:327
 in _mapreduce at reduce.jl:151
 in mapreduce at reduce.jl:158

)

nalimilan · 2015-06-24T12:32:33Z

Calling | or & on a mix of booleans and other types doesn't make any sense to me. The fact that these operators have quite different meanings strikes again... Cf. #5187

So I'd say the question is: is it OK to fail that way, or should a better error be reported? Maybe that's not an issue as long as the docs clearly state that with | and &, mapreduce will short-circuit as soon as some values are encountered (giving a clear definition of them).

fcard · 2015-06-24T14:20:08Z

Iunno, there are more than 30 methods in both | and & and half of them involve two potentially different types. In my opinion, if these mixups are unwanted, the methods themselves should be removed rather than the support dropped in mapreduce. I mean, unless there are no other solutions.

The way I found to deal with this is the following:

In mapreduce(f, op, A::Iterable{T}), infer what are the possible return types of f(T). If the only return type is a type of which we have a single value to short-circuit from with op, then mapreduce short-circuits on that value.

For example, in mapreduce(f=identity, op=|, A=[false, true, false]), A is an iterable of Bool, and identity(Bool) -> Bool, and since we can exit on true with |, then we short-circuit as soon as we reach the second element of A.

In the case of f(x) = x == 1? true : x, mapreduce(f, |, 1:10) can't short-circuit because there is no single value to exit from since f(Int) -> Union{Int, Bool}.

Both any and all create a special Predicate functor that always assumes its return value is Bool, so these two functions always try to short-circuit. (and error out if the predicate returns a non-boolean type)

Unfortunately this made the code much more complex than I wish it were (although surprising not slower in most cases, in fact it's faster in some), so alternate solutions are welcome.
(It's also not done yet, I am still deciding on a few things)

fcard · 2015-07-28T14:24:08Z

Rebased!

@lindahua What you're saying applies to mapreduce, but there shouldn't be any problem making any and all work on all callables, since they would get wrapped in a Predicate anyway. I don't mind making this change, if people want it.

About having separate functions, this was discussed at the beginning of #11750, but the conclusion I think was that people shouldn't be relying on non-short-circuiting behaviour for side-effects while using predicates. One could still use mapfoldl(f, |\&, A) if they don't want short-circuiting.

lindahua · 2015-07-28T15:50:26Z

base/functors.jl

+
+type Predicate <: Func{1}
+    f::Function
+end


With such definition, we are not able to encapsulate a callable functor (not function) into an instance of Predicate.

For example:

immutable IsPosFun end call(::IsPosFun, x) = (x > zero(x)) # The following statement would cause an error Predicate(IsPosFun())

Use of functors is quite important, especially in cases where performance is critical.

lindahua · 2015-07-28T15:53:19Z

Here is an example:

immutable IsPosFun end
call(::IsPosFun, x) = (x > zero(x))

# people might want to have short-circuit behavior on the following
# but with current implementation, it does not seem to be the case
all(IsPosFun(), x)  

# also, `Predicate(IsPosFun())` does not work

lindahua · 2015-07-28T15:59:09Z

I think the Predicate type should be defined in a way to allow it to encapsulate a functor, as follows:

immutable Predicate{F}
    f::F
end
Predicate{F}(f::F) = Predicate{F}(f)

call(pred::Predicate, x) = call(pred.f, x)

fcard · 2015-07-28T17:09:23Z

Yeah, that's what I thought I had to do, make both the predicate and any/all capable of receiving types other than functions. It's a pretty simple to do, I am just waiting to see if everyone agrees with the change.

tkelman · 2015-07-29T02:10:31Z

+1 to @lindahua's suggestion (on functor type generality, I don't agree with making a separate all_sc or any_sc just yet)

Deprecate nonboolean `any` and `all`.

fcard · 2015-07-29T16:33:18Z

Done. Since this commit is not related to short circuiting, should I keep it separate, or should I squash it regardless?

tkelman · 2015-07-29T16:38:31Z

I think separate commits are fine for this change, as long as the tests would pass if someone were to happen upon any one of the intermediate commits here in the process of doing git bisect on something else.

sbromberger · 2015-07-29T16:47:25Z

I'm coming back to this after a while away, but I confess to being completely lost. It seems that we might have diverged a bit from my original "make any() and all() short-circuiting" request and I'd just like to confirm that it's implemented in this PR. :)

Thanks,

Seth.

fcard · 2015-07-29T16:53:50Z

@tkelman The tests pass on all commits, but IIRC the first one triggered a lot of deprecations; would that cause any trouble?

@sbromberger That's still there, don't worry. This functor change is a "might as well do this too while we're here" kinda change, short-circuiting remains the main objective.

RFC: Make `any` and `all` short-circuiting

tkelman · 2015-07-29T17:05:39Z

You've rebased this more than enough times by now. Thanks for all the hard work here! 🎊

[ci skip]

fcard · 2015-07-29T17:45:04Z

Thank you very much! I wouldn't mind doing some more work, but I am happy to see this merged. Hopefully it will be useful :)

Thanks again everybody for the kind messages, I didn't mention the new ones before because I didn't want to derail this further, but I am really grateful. Thanks also for your time and patience.

If there is a next time, let's hope it will go a lot smoother. Till then!

sbromberger · 2015-07-29T17:46:21Z

This. Is. Awesome. Thank you all, especially @fcard, for getting this changed.

tkelman · 2015-07-29T18:29:38Z

I certainly hope there will be a next time, this was really well done. But only if you're up for it, of course.

fcard · 2015-07-29T19:51:02Z

I am! And I am feeling a lot better, but I want to make sure I am 100% before I make another contribution. I will give myself a cool down, work on a few personal projects, and then I will resume, starting with that deprecations testing I said I would do.

Thanks again, contributing to Julia was a very positive experience. I learned a lot, and the community was very welcoming. I hope to return soon :)

StefanKarpinski · 2015-07-29T19:58:37Z

We look forward to it – thanks for contributing!

So these will now give MethodError for non boolean input

fcard mentioned this pull request Jun 20, 2015

any(pred, itr) and all(pred, itr) do not short-circuit. #11750

Closed

fcard force-pushed the patch-3 branch from c826025 to e63b50b Compare June 20, 2015 19:42

hayd reviewed Jun 21, 2015
View reviewed changes

fcard changed the title ~~Make any and all short-circuiting~~ WIP: Make any and all short-circuiting Jun 22, 2015

lindahua reviewed Jul 28, 2015
View reviewed changes

fcard added 2 commits July 29, 2015 10:22

Make any and all short-circuiting

7a6d657

Deprecate nonboolean `any` and `all`.

Remove maps/comprehensions from uses of any/all

abd70e0

fcard force-pushed the patch-3 branch from ad1a0b9 to ff8c751 Compare July 29, 2015 13:22

Make any/all capable of receiving functors.

8cdb199

fcard force-pushed the patch-3 branch from ff8c751 to 8cdb199 Compare July 29, 2015 13:58

tkelman added a commit that referenced this pull request Jul 29, 2015

Merge pull request #11774 from fcard/patch-3

d51ebdd

RFC: Make `any` and `all` short-circuiting

tkelman merged commit d51ebdd into JuliaLang:master Jul 29, 2015

tkelman referenced this pull request Jul 29, 2015

Fix doctests for the Interfaces chapter

6f2ce0f

[ci skip]

fcard deleted the patch-3 branch July 29, 2015 18:07

fcard mentioned this pull request Jul 29, 2015

any and all are bitwise? #11540

Closed

sbromberger mentioned this pull request Jul 30, 2015

take advantage of short circuiting any()/all() in is_bipartite() sbromberger/LightGraphs.jl#106

Closed

tkelman added a commit that referenced this pull request May 27, 2016

Move or_bool_only, and_bool_only to base/reduce.jl, ref #11774

710e82e

So these will now give MethodError for non boolean input

tkelman added a commit that referenced this pull request May 27, 2016

Move or_bool_only, and_bool_only to base/reduce.jl, ref #11774

0f7c89f

So these will now give MethodError for non boolean input

cstjean mentioned this pull request Oct 28, 2016

Short-circuiting of all on generators #19151

Closed

RFC: Make any and all short-circuiting #11774

RFC: Make any and all short-circuiting #11774

Conversation

fcard commented Jun 20, 2015

Keno commented Jun 20, 2015

fcard commented Jun 20, 2015

Keno commented Jun 20, 2015

fcard commented Jun 20, 2015

Keno commented Jun 20, 2015

fcard commented Jun 20, 2015

fcard commented Jun 20, 2015

fcard commented Jun 20, 2015

sbromberger commented Jun 20, 2015

fcard commented Jun 20, 2015

sbromberger commented Jun 20, 2015

fcard commented Jun 20, 2015

sbromberger commented Jun 20, 2015

fcard commented Jun 20, 2015

tkelman commented Jun 21, 2015

sbromberger commented Jun 21, 2015

fcard commented Jun 21, 2015 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonster commented Jun 21, 2015

fcard commented Jun 22, 2015

nalimilan commented Jun 22, 2015

ScottPJones commented Jun 22, 2015

fcard commented Jun 23, 2015

nalimilan commented Jun 24, 2015

fcard commented Jun 24, 2015

fcard commented Jul 28, 2015

Choose a reason for hiding this comment

lindahua commented Jul 28, 2015

lindahua commented Jul 28, 2015

fcard commented Jul 28, 2015

tkelman commented Jul 29, 2015

fcard commented Jul 29, 2015

tkelman commented Jul 29, 2015

sbromberger commented Jul 29, 2015

fcard commented Jul 29, 2015

tkelman commented Jul 29, 2015

fcard commented Jul 29, 2015

sbromberger commented Jul 29, 2015

tkelman commented Jul 29, 2015

fcard commented Jul 29, 2015

StefanKarpinski commented Jul 29, 2015

RFC: Make `any` and `all` short-circuiting #11774

RFC: Make `any` and `all` short-circuiting #11774

fcard commented Jun 21, 2015 •

edited

Loading