Add fast pathway for copy
, collect
, tcollect
, and tcopy
for size-stable operations
#553
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Current State
Fundamentally, Transducers is quite good at doing reductions but collecting results into an output array is a major weakness. The way that it does this currently is essentially just doing
(or
foldxt
for the parallel version). Iff
is expensive to evaluate, then this extra overhead isn't so bad, but for functions that can be done in a CPU cycle or two, it's catastrophic:Here's how it currently looks with a very cheap function (
abs
):And here's a more expensive function (
sin
):This PR
In this PR I made a version of
collect(xf::Transducer, coll)
(and similar forcopy
) operating on transducers that checks ifxf
preserves the size ofcoll
(i.e.Map
is okay, butFilter
is not), and checks ifcoll
has a known (runtime) size. If both of those are satisfied, then we do a more optimized method that involvessetindex!!
on arrays.We can't do the
setindex!!
thing directly fortcollect
since it would cause race conditions if the output object changed, so instead fortcollect
I split thecollection
into a bunch ofchunks
whose size is determined bybasesize
(I useIterators.partition
for this currently and want to fix that before merging to useSplittablesBase.jl
).Now here's what those benchmarks look like with my new changes:
abs
:and
sin
:So that's a nice speedup, though
tcollect
is still leaving some performance on the table, it's still an improvement. This should help alleviate tkf/ThreadsX.jl#196 and tkf/ThreadsX.jl#196, though it still won't be as fast asThreadsX.map!
since the way we combine the results from different arrays is not as efficient as preallocating and then just assigning.