Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Make any and all short-circuiting #11774

Merged
merged 3 commits into from
Jul 29, 2015
Merged

Conversation

fcard
Copy link
Contributor

@fcard fcard commented Jun 20, 2015

Doesn't apply for n < 16, but that shouldn't be a big problem, right? There are apparently optimizations for small n, so I decided not to mess with that.

@Keno
Copy link
Member

Keno commented Jun 20, 2015

I don't think this is correct. Wouldn't this make all mapreduce invocations short-circuit?

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

Only the ones where the reduce function is AndFun or OrFun. I suppose I could make it more specific.

@Keno
Copy link
Member

Keno commented Jun 20, 2015

Well, at the very least that change deserved additional discussion.

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

I see, sorry. It just seemed that this is what the function was supposed to be in the first place, but I guess I shouldn't jump to conclusions so hastily.

@Keno
Copy link
Member

Keno commented Jun 20, 2015

I think we need to determine what the semantics of mapreduce need to be. It is quite possible that the original function is incorrect in short circuiting.

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

Yes. If one wants to have side-effects with mapreduce and | or &, this wouldn't let them. I suppose this is incorrect. It's easy to have any and all separated from mapreduce, but I wish they could still benefit from it somehow. I could create new functors AndFunShort and OrFunShort that are specific to any and all. I could also try the isreduced approach, but from the silence I got I guess nobody is interested :P Maybe short-circuiting is just conceptually incompatible with mapreduce?

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

Ok, last try with "cleverness", if people don't like this I am just going with the easy route. Either way, after this I am taking a break to reassess my life.

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

I realized that the Bool parameter was because reduce(|, _) can be used for numbers and that shouldn't short-circuit ever*. I could've made a AbstractArray{Number} specialization but the more changes I made the more I realized any/all should've just been its own thing. I also noticed that there are some other functions in reduce.jl that are separated from mapreduce and that made me feel more confident in making the simple version.

So here, straightforward version, as it should have been from the beginning. I will see if there are any other changes to be made and then squash. Thanks for your patience.

*actually, it can short-circuit sometimes, still it's good to have them separated. ...?

darn you lack of impulse control

@sbromberger
Copy link
Contributor

@fcard

Doesn't apply for n < 16, but that shouldn't be a big problem, right?

This is a big problem. Consider a very expensive function (which is exactly the use case I describe in #11750 (comment)). I think the expectation would be "short circuit under every circumstance" or "short circuit under no circumstance", but not some hybrid that's dependent on the length of the itr.

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

@sbromberger Fixed it already when I separated it from mapreduce. I suppose that's another reason to have done that.

@sbromberger
Copy link
Contributor

@fcard Ah, sorry. I read your initial comment without having reviewed the change. Thanks!

PS: I'm totally in favor of separating any/all from mapreduce.

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

@sbromberger No big deal, I realized later that it might have been important to keep it consistent, when n is small but the predicate is expensive, as per your example, or in the case that someone is depending in the short-circuit behavior for side-effects (like counting up to the first true or something).

@sbromberger
Copy link
Contributor

I don't understand why the AppVeyor build is failing. Timeout issues?

@fcard
Copy link
Contributor Author

fcard commented Jun 20, 2015

I guess so. From here:

Are there any build time restrictions?

All plans have maximum build job execution time of 40 minutes.

@tkelman
Copy link
Contributor

tkelman commented Jun 21, 2015

The worker that started on linalg1 just froze, ref #7942, happens occasionally. I restarted the build.

@sbromberger
Copy link
Contributor

w00t - looks like tests passed. Thanks, @fcard - this looks awesome. :)

@fcard
Copy link
Contributor Author

fcard commented Jun 21, 2015

I was testing the speed of the new any and all and noticed that it was slower in a few cases, so I bought back some of the optimizations of the old version. Here are the times I got (gists):

Summarized (average times)
Expanded (max, min, result, averages)

Here is the code I used to get the results. It's a bit messy.
I also removed a bit of redundancy from the tests, leftovers from a previous version.

i = 1
len = length(itr)
while i <= len
@inbounds x = itr[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is doing this as a while loop faster than for x in itr ??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently. reduce.jl is full of tricks like these, and the tests I did seem to confirm this: https://gist.github.com/fcard/aca7b07d24ff24b01d4d#file-sc-any-all-perf-summary-jl-L43-L49

Only for vectors, though. Didn't do much for ranges or tuples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's crazy/seems really bad. :( To clarify: the only difference between V1 and V2 in this particular signature is the while/for ? Edit: Is this #11787?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the main difference, the others were made mostly to accommodate it. About it being #11787, maybe? Maybe it's making it worse, but these kinds of optimizations have been on reduce.jl for a while, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think @inbounds for x in itr should be equally fast. @inbounds is necessary for optimal performance due to #11350, but whether we should actually use it for non-Arrays is another question, since we'll segfault on user-defined types if done is incorrectly defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try changing to this and see if it works faster, although if #11787 is at effect it should obfuscate the results somewhat. Still wondering about the other optimizations in reduce.jl though, were they made before the compiler could generate better code for array for loops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, @inbounds makes it fast. I will keep it in the array definitions only, that segfault problem sounds scary (doesn't seem to make any difference for the other types anyway).

@simonster
Copy link
Member

IMO mapreduce should short-circuit but mapfoldl/mapfoldr should not. mapreduce already leaves many things up to the implementation. We don't define the associativity and don't guarantee that the map function is called for each element of the array if they are non-unique to avoid having to invoke f on all the zeros in a sparse matrix. Perhaps we should just say that mapreduce assumes that f is "pure" modulo allocation, i.e., that invoking f on the same input produces outputs that are == and of the same type, and invoking f does not modify observable global state.

@fcard fcard changed the title Make any and all short-circuiting WIP: Make any and all short-circuiting Jun 22, 2015
@fcard
Copy link
Contributor Author

fcard commented Jun 22, 2015

I addressed the feedback I received so far. There are some things being debated, so I am going to keep this as a WIP for now and wait until conclusions are made.

Another thing that I need to do is solve the problem I reached when trying to optimize any/all when using the identity function. It's not really necessary for this, so if it becomes too much trouble I will leave it for later, it's just that I find unintuitive that any(identity, itr) is 10x slower than any(itr).

@nalimilan
Copy link
Member

+1 to @simonster's proposal. Also, I agree any and all should only return Bool.

@ScottPJones
Copy link
Contributor

Ditto

@fcard
Copy link
Contributor Author

fcard commented Jun 23, 2015

Hello, I am mostly done implementing this, but I reached a stumbling block.
Is this a valid usage of mapreduce?

julia> mapreduce(x->(x == 1? true : x), |, 1:10)
15

Or using f::T -> Union{Bool, SomethingElse} in general. If we reach a true, how do we know if there are any nonboolean results later that might modify the final result? In the case of any or all, it's a safe assumption to make that the predicate should return a Bool, but with mapreduce, I'd expect the transformer function to be allowed to return any value that can be used with the operator, and |/& can use booleans, integers and arrays, all in the same operation.

(The current implementation actually sometimes assumes that predicates return booleans so this happens:

julia> mapreduce(Int, |, trues(15))
1

julia> mapreduce(Int, |, trues(16))
ERROR: TypeError: non-boolean (Int64) used in boolean context
 in mapreduce_impl at reduce.jl:327
 in _mapreduce at reduce.jl:151
 in mapreduce at reduce.jl:158

)

@nalimilan
Copy link
Member

Calling | or & on a mix of booleans and other types doesn't make any sense to me. The fact that these operators have quite different meanings strikes again... Cf. #5187

So I'd say the question is: is it OK to fail that way, or should a better error be reported? Maybe that's not an issue as long as the docs clearly state that with | and &, mapreduce will short-circuit as soon as some values are encountered (giving a clear definition of them).

@fcard
Copy link
Contributor Author

fcard commented Jun 24, 2015

Iunno, there are more than 30 methods in both | and & and half of them involve two potentially different types. In my opinion, if these mixups are unwanted, the methods themselves should be removed rather than the support dropped in mapreduce. I mean, unless there are no other solutions.

The way I found to deal with this is the following:

In mapreduce(f, op, A::Iterable{T}), infer what are the possible return types of f(T). If the only return type is a type of which we have a single value to short-circuit from with op, then mapreduce short-circuits on that value.

For example, in mapreduce(f=identity, op=|, A=[false, true, false]), A is an iterable of Bool, and identity(Bool) -> Bool, and since we can exit on true with |, then we short-circuit as soon as we reach the second element of A.

In the case of f(x) = x == 1? true : x, mapreduce(f, |, 1:10) can't short-circuit because there is no single value to exit from since f(Int) -> Union{Int, Bool}.

Both any and all create a special Predicate functor that always assumes its return value is Bool, so these two functions always try to short-circuit. (and error out if the predicate returns a non-boolean type)

Unfortunately this made the code much more complex than I wish it were (although surprising not slower in most cases, in fact it's faster in some), so alternate solutions are welcome.
(It's also not done yet, I am still deciding on a few things)

@fcard
Copy link
Contributor Author

fcard commented Jul 28, 2015

Rebased!

@lindahua What you're saying applies to mapreduce, but there shouldn't be any problem making any and all work on all callables, since they would get wrapped in a Predicate anyway. I don't mind making this change, if people want it.

About having separate functions, this was discussed at the beginning of #11750, but the conclusion I think was that people shouldn't be relying on non-short-circuiting behaviour for side-effects while using predicates. One could still use mapfoldl(f, |\&, A) if they don't want short-circuiting.


type Predicate <: Func{1}
f::Function
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With such definition, we are not able to encapsulate a callable functor (not function) into an instance of Predicate.

For example:

immutable IsPosFun end
call(::IsPosFun, x) = (x > zero(x))

# The following statement would cause an error
Predicate(IsPosFun())

Use of functors is quite important, especially in cases where performance is critical.

@lindahua
Copy link
Contributor

Here is an example:

immutable IsPosFun end
call(::IsPosFun, x) = (x > zero(x))

# people might want to have short-circuit behavior on the following
# but with current implementation, it does not seem to be the case
all(IsPosFun(), x)  

# also, `Predicate(IsPosFun())` does not work

@lindahua
Copy link
Contributor

I think the Predicate type should be defined in a way to allow it to encapsulate a functor, as follows:

immutable Predicate{F}
    f::F
end
Predicate{F}(f::F) = Predicate{F}(f)

call(pred::Predicate, x) = call(pred.f, x)

@fcard
Copy link
Contributor Author

fcard commented Jul 28, 2015

Yeah, that's what I thought I had to do, make both the predicate and any/all capable of receiving types other than functions. It's a pretty simple to do, I am just waiting to see if everyone agrees with the change.

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

+1 to @lindahua's suggestion (on functor type generality, I don't agree with making a separate all_sc or any_sc just yet)

@fcard
Copy link
Contributor Author

fcard commented Jul 29, 2015

Done. Since this commit is not related to short circuiting, should I keep it separate, or should I squash it regardless?

@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

I think separate commits are fine for this change, as long as the tests would pass if someone were to happen upon any one of the intermediate commits here in the process of doing git bisect on something else.

@sbromberger
Copy link
Contributor

I'm coming back to this after a while away, but I confess to being completely lost. It seems that we might have diverged a bit from my original "make any() and all() short-circuiting" request and I'd just like to confirm that it's implemented in this PR. :)

Thanks,

Seth.

@fcard
Copy link
Contributor Author

fcard commented Jul 29, 2015

@tkelman The tests pass on all commits, but IIRC the first one triggered a lot of deprecations; would that cause any trouble?

@sbromberger That's still there, don't worry. This functor change is a "might as well do this too while we're here" kinda change, short-circuiting remains the main objective.

tkelman added a commit that referenced this pull request Jul 29, 2015
RFC: Make `any` and `all` short-circuiting
@tkelman tkelman merged commit d51ebdd into JuliaLang:master Jul 29, 2015
@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

You've rebased this more than enough times by now. Thanks for all the hard work here! 🎊

@fcard
Copy link
Contributor Author

fcard commented Jul 29, 2015

Thank you very much! I wouldn't mind doing some more work, but I am happy to see this merged. Hopefully it will be useful :)

Thanks again everybody for the kind messages, I didn't mention the new ones before because I didn't want to derail this further, but I am really grateful. Thanks also for your time and patience.

If there is a next time, let's hope it will go a lot smoother. Till then!

@sbromberger
Copy link
Contributor

This. Is. Awesome. Thank you all, especially @fcard, for getting this changed.

@fcard fcard deleted the patch-3 branch July 29, 2015 18:07
@tkelman
Copy link
Contributor

tkelman commented Jul 29, 2015

I certainly hope there will be a next time, this was really well done. But only if you're up for it, of course.

@fcard
Copy link
Contributor Author

fcard commented Jul 29, 2015

I am! And I am feeling a lot better, but I want to make sure I am 100% before I make another contribution. I will give myself a cool down, work on a few personal projects, and then I will resume, starting with that deprecations testing I said I would do.

Thanks again, contributing to Julia was a very positive experience. I learned a lot, and the community was very welcoming. I hope to return soon :)

@fcard fcard mentioned this pull request Jul 29, 2015
@StefanKarpinski
Copy link
Member

We look forward to it – thanks for contributing!

tkelman added a commit that referenced this pull request May 27, 2016
So these will now give MethodError for non boolean input
tkelman added a commit that referenced this pull request May 27, 2016
So these will now give MethodError for non boolean input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.