-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing over bool array fails to optimize somehow #43501
Comments
The keyword-variant calls |
A similar experiment with
|
Even
I'm not familiar with how these functions are implemented, but the impression I got is that they eventually call a |
Not sure I made any mistakes here. Like I said, I'm not too familiar with this code. Tracing back what function is being called exactly, this is what I found using
That would mean we are using an implementation of mapreduce that performs a binary-tree like traversal of the data, instead of linear. I wasn't sure yet this was the case when I noticed this function, so I opened this other issue #45000 Here's a measurement of the time difference between reduce and the for-loop for My guess is that it's overhead from the algorithm, and it gets exacerbated in the boolean case. I suppose Bool and Int do not benefit from the pairwise summation scheme. It would probably be nice to make sure this algorithm is only used for floating-point element types, while anything else uses a simple linear traversal approach. Or otherwise, there should be a version of Alternatively, we could increase the block size based on the operations, but also on the element type. Although
|
Folds all have some weird performance issues. See #43310 |
The pairwise reduction could be turned off by Base.pairwise_blocksize(f, ::typeof(|)) = typemax(Int) # enlarge the threshold to avoid split. then we have julia> @btime reduce(|, $data)
88.438 ns (0 allocations: 0 bytes) # 268.111 ns before. Still slower than The remaining difference comes from the initialization: function foo(x)
y = false
@inbounds for i in 1:length(x)
y |= x[i]
end
y
end
function foo2(x)
@inbounds y = x[1]|x[2]
@inbounds for i in 3:length(x)
y |= x[i]
end
y
end
@btime foo2($data) # 83.212 ns (0 allocations: 0 bytes)
@btime foo($data) # 37.059 ns (0 allocations: 0 bytes) |
@N5N3 this is great. I fear the problem may be even larger than doing that for this one operation, though. Would you mind commenting on #45000 ? Perhaps I should not have created a second issue. But the point is I believe if we start looking around, we might actually like to raise this block length to many more operations. I don't think there are many desirable applications for that pairwise |
Okay, it's clear the issue is that |
Whoops looks like I was wrong |
Minimal example
Shouldn't
reduce
without the init keyword first find the initial value, then call the second method? In that case, it should not be 5x slower.Edit: Also, for some reason, this does not vectorize.
The text was updated successfully, but these errors were encountered: