-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Make any
and all
short-circuiting
#11774
Conversation
I don't think this is correct. Wouldn't this make all mapreduce invocations short-circuit? |
Only the ones where the reduce function is |
Well, at the very least that change deserved additional discussion. |
I see, sorry. It just seemed that this is what the function was supposed to be in the first place, but I guess I shouldn't jump to conclusions so hastily. |
I think we need to determine what the semantics of mapreduce need to be. It is quite possible that the original function is incorrect in short circuiting. |
Yes. If one wants to have side-effects with mapreduce and |
Ok, last try with "cleverness", if people don't like this I am just going with the easy route. Either way, after this I am taking a break to reassess my life. |
I realized that the Bool parameter was because So here, straightforward version, as it should have been from the beginning. I will see if there are any other changes to be made and then squash. Thanks for your patience. *actually, it can short-circuit sometimes, still it's good to have them separated. ...? darn you lack of impulse control |
This is a big problem. Consider a very expensive function (which is exactly the use case I describe in #11750 (comment)). I think the expectation would be "short circuit under every circumstance" or "short circuit under no circumstance", but not some hybrid that's dependent on the length of the |
@sbromberger Fixed it already when I separated it from |
@fcard Ah, sorry. I read your initial comment without having reviewed the change. Thanks! PS: I'm totally in favor of separating |
@sbromberger No big deal, I realized later that it might have been important to keep it consistent, when |
I don't understand why the AppVeyor build is failing. Timeout issues? |
I guess so. From here:
|
The worker that started on |
w00t - looks like tests passed. Thanks, @fcard - this looks awesome. :) |
I was testing the speed of the new Summarized (average times) Here is the code I used to get the results. It's a bit messy. |
i = 1 | ||
len = length(itr) | ||
while i <= len | ||
@inbounds x = itr[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is doing this as a while loop faster than for x in itr
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently. reduce.jl
is full of tricks like these, and the tests I did seem to confirm this: https://gist.github.com/fcard/aca7b07d24ff24b01d4d#file-sc-any-all-perf-summary-jl-L43-L49
Only for vectors, though. Didn't do much for ranges or tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's crazy/seems really bad. :( To clarify: the only difference between V1 and V2 in this particular signature is the while/for ? Edit: Is this #11787?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the main difference, the others were made mostly to accommodate it. About it being #11787, maybe? Maybe it's making it worse, but these kinds of optimizations have been on reduce.jl
for a while, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think @inbounds for x in itr
should be equally fast. @inbounds
is necessary for optimal performance due to #11350, but whether we should actually use it for non-Arrays is another question, since we'll segfault on user-defined types if done
is incorrectly defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try changing to this and see if it works faster, although if #11787 is at effect it should obfuscate the results somewhat. Still wondering about the other optimizations in reduce.jl
though, were they made before the compiler could generate better code for array for loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, @inbounds
makes it fast. I will keep it in the array definitions only, that segfault problem sounds scary (doesn't seem to make any difference for the other types anyway).
IMO |
any
and all
short-circuitingany
and all
short-circuiting
I addressed the feedback I received so far. There are some things being debated, so I am going to keep this as a WIP for now and wait until conclusions are made. Another thing that I need to do is solve the problem I reached when trying to optimize |
+1 to @simonster's proposal. Also, I agree |
Ditto |
Hello, I am mostly done implementing this, but I reached a stumbling block. julia> mapreduce(x->(x == 1? true : x), |, 1:10)
15 Or using (The current implementation actually sometimes assumes that predicates return booleans so this happens: julia> mapreduce(Int, |, trues(15))
1
julia> mapreduce(Int, |, trues(16))
ERROR: TypeError: non-boolean (Int64) used in boolean context
in mapreduce_impl at reduce.jl:327
in _mapreduce at reduce.jl:151
in mapreduce at reduce.jl:158 ) |
Calling So I'd say the question is: is it OK to fail that way, or should a better error be reported? Maybe that's not an issue as long as the docs clearly state that with |
Iunno, there are more than 30 methods in both The way I found to deal with this is the following: In For example, in In the case of Both Unfortunately this made the code much more complex than I wish it were (although surprising not slower in most cases, in fact it's faster in some), so alternate solutions are welcome. |
Rebased! @lindahua What you're saying applies to About having separate functions, this was discussed at the beginning of #11750, but the conclusion I think was that people shouldn't be relying on non-short-circuiting behaviour for side-effects while using predicates. One could still use |
|
||
type Predicate <: Func{1} | ||
f::Function | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With such definition, we are not able to encapsulate a callable functor (not function) into an instance of Predicate
.
For example:
immutable IsPosFun end
call(::IsPosFun, x) = (x > zero(x))
# The following statement would cause an error
Predicate(IsPosFun())
Use of functors is quite important, especially in cases where performance is critical.
Here is an example: immutable IsPosFun end
call(::IsPosFun, x) = (x > zero(x))
# people might want to have short-circuit behavior on the following
# but with current implementation, it does not seem to be the case
all(IsPosFun(), x)
# also, `Predicate(IsPosFun())` does not work |
I think the immutable Predicate{F}
f::F
end
Predicate{F}(f::F) = Predicate{F}(f)
call(pred::Predicate, x) = call(pred.f, x) |
Yeah, that's what I thought I had to do, make both the predicate and |
+1 to @lindahua's suggestion (on functor type generality, I don't agree with making a separate |
Deprecate nonboolean `any` and `all`.
Done. Since this commit is not related to short circuiting, should I keep it separate, or should I squash it regardless? |
I think separate commits are fine for this change, as long as the tests would pass if someone were to happen upon any one of the intermediate commits here in the process of doing |
I'm coming back to this after a while away, but I confess to being completely lost. It seems that we might have diverged a bit from my original "make Thanks, Seth. |
@tkelman The tests pass on all commits, but IIRC the first one triggered a lot of deprecations; would that cause any trouble? @sbromberger That's still there, don't worry. This functor change is a "might as well do this too while we're here" kinda change, short-circuiting remains the main objective. |
RFC: Make `any` and `all` short-circuiting
You've rebased this more than enough times by now. Thanks for all the hard work here! 🎊 |
Thank you very much! I wouldn't mind doing some more work, but I am happy to see this merged. Hopefully it will be useful :) Thanks again everybody for the kind messages, I didn't mention the new ones before because I didn't want to derail this further, but I am really grateful. Thanks also for your time and patience. If there is a next time, let's hope it will go a lot smoother. Till then! |
This. Is. Awesome. Thank you all, especially @fcard, for getting this changed. |
I certainly hope there will be a next time, this was really well done. But only if you're up for it, of course. |
I am! And I am feeling a lot better, but I want to make sure I am 100% before I make another contribution. I will give myself a cool down, work on a few personal projects, and then I will resume, starting with that deprecations testing I said I would do. Thanks again, contributing to Julia was a very positive experience. I learned a lot, and the community was very welcoming. I hope to return soon :) |
We look forward to it – thanks for contributing! |
So these will now give MethodError for non boolean input
So these will now give MethodError for non boolean input
Doesn't apply forn < 16
, but that shouldn't be a big problem, right? There are apparently optimizations for smalln
, so I decided not to mess with that.