Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations #553

MasonProtter · 2023-05-04T06:05:44Z

Current State

Fundamentally, Transducers is quite good at doing reductions but collecting results into an output array is a major weakness. The way that it does this currently is essentially just doing

foldxl(append!!, Map(f), coll)

(or foldxt for the parallel version). If f is expensive to evaluate, then this extra overhead isn't so bad, but for functions that can be done in a CPU cycle or two, it's catastrophic:

Here's how it currently looks with a very cheap function (abs):

julia> let A = rand(100_000)
           @btime map(abs, $A)
           @btime collect(Map(abs), $A)
           @btime tcollect(Map(abs), $A)
       end;
  31.440 μs (2 allocations: 781.30 KiB)
  70.460 μs (12 allocations: 1.83 MiB)
  212.270 μs (123 allocations: 4.54 MiB)

And here's a more expensive function (sin):

julia> let A = rand(100_000)
           @btime map(sin, $A)
           @btime collect(Map(sin), $A)
           @btime tcollect(Map(sin), $A)
       end;
  447.810 μs (2 allocations: 781.30 KiB)
  486.680 μs (12 allocations: 1.83 MiB)
  302.360 μs (123 allocations: 4.54 MiB)

This PR

In this PR I made a version of collect(xf::Transducer, coll) (and similar for copy) operating on transducers that checks if xf preserves the size of coll (i.e. Map is okay, but Filter is not), and checks if coll has a known (runtime) size. If both of those are satisfied, then we do a more optimized method that involves setindex!! on arrays.

We can't do the setindex!! thing directly for tcollect since it would cause race conditions if the output object changed, so instead for tcollect I split the collection into a bunch of chunks whose size is determined by basesize (I use Iterators.partition for this currently and want to fix that before merging to use SplittablesBase.jl).

Now here's what those benchmarks look like with my new changes:
abs:

julia> let A = rand(100_000)
           @btime map(abs, $A)
           @btime collect(Map(abs), $A)
           @btime tcollect(Map(abs), $A)
       end;
  28.860 μs (2 allocations: 781.30 KiB)
  28.870 μs (2 allocations: 781.30 KiB)
  162.670 μs (244 allocations: 3.15 MiB)

and sin:

julia> let A = rand(100_000)
           @btime map(sin, $A)
           @btime collect(Map(sin), $A)
           @btime tcollect(Map(sin), $A)
       end;
  481.480 μs (2 allocations: 781.30 KiB)
  482.801 μs (2 allocations: 781.30 KiB)
  217.760 μs (244 allocations: 3.15 MiB)

So that's a nice speedup, though tcollect is still leaving some performance on the table, it's still an improvement. This should help alleviate tkf/ThreadsX.jl#196 and tkf/ThreadsX.jl#196, though it still won't be as fast as ThreadsX.map! since the way we combine the results from different arrays is not as efficient as preallocating and then just assigning.

…e Arrays

codecov · 2023-05-04T06:19:46Z

Codecov Report

Merging #553 (c616391) into master (f8d0dfe) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #553      +/-   ##
==========================================
+ Coverage   95.43%   95.54%   +0.11%     
==========================================
  Files          32       32              
  Lines        2233     2268      +35     
==========================================
+ Hits         2131     2167      +36     
+ Misses        102      101       -1

Flag	Coverage Δ
Pkg.test	`94.54% <100.00%> (-0.02%)`	⬇️
Run.test	`95.41% <100.00%> (+0.20%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/Transducers.jl	`73.33% <ø> (ø)`
src/core.jl	`93.15% <100.00%> (+0.09%)`	⬆️
src/dreduce.jl	`100.00% <100.00%> (ø)`
src/processes.jl	`94.71% <100.00%> (+0.50%)`	⬆️
src/reduce.jl	`96.61% <100.00%> (+0.18%)`	⬆️

... and 2 files with indirect coverage changes

Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations

add fast pathway for copy, collect, tcollect, and tcopy on size-stabl…

2059c2e

…e Arrays

This was referenced May 4, 2023

Performance of map tkf/ThreadsX.jl#196

Open

Bug in map? tkf/ThreadsX.jl#198

Open

MasonProtter added 14 commits May 4, 2023 15:07

correctly handle small reducibles in tcopy

2d9cf6c

skip ambigutites relating to mapreduce and kwcall

6518a6c

create new env for v1.9 and v1.10

486e18e

fixes

4a6e9a6

remove accidentally included line

57c8625

handle old kwcall syntax

bc0cd13

typo

b13b7ac

more fixes for old julia versions

a7e0946

skip Aqua if kwcall isn't defined

1ab5f42

fix dumb mistakes with multidimensional arrays

ee67340

use length(size) instead of ndims because base is missing methods

470f70b

Tell julia that Reducible has unknown iterator size

9b98edf

maybe fix _collect inference

9ef7bc1

remove superflous line

7013bc9

MasonProtter enabled auto-merge (squash) May 5, 2023 00:15

MasonProtter mentioned this pull request May 5, 2023

Improve performance of collect on size unstable collections. #554

Open

MasonProtter disabled auto-merge May 5, 2023 00:29

MasonProtter added 2 commits May 4, 2023 19:08

factor out split_into_chunks so that dreduce can use it

2fe7849

add test for #552

61dcfed

MasonProtter mentioned this pull request May 5, 2023

foldxd fails over Iterators.Product #552

Closed

increment version

c616391

tkf pushed a commit that referenced this pull request May 9, 2023

Merge pull request #553 from JuliaFolds/fixed-size-collect

2a51f8d

Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations

MasonProtter closed this May 13, 2023

sethaxen mentioned this pull request Jun 23, 2023

Multi-threaded multi-path Pathfinder broken with recent Transducers versions mlcolab/Pathfinder.jl#144

Open

This was referenced Jun 23, 2023

tcollect on withprogress now fails on v0.4.76 #557

Open

tcollect(withprogress(itr)) is currently broken JuliaFolds2/Transducers.jl#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations #553

Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations #553

MasonProtter commented May 4, 2023

codecov bot commented May 4, 2023 •

edited

Loading

Add fast pathway for copy, collect, tcollect, and tcopy for size-stable operations #553

Add fast pathway for copy, collect, tcollect, and tcopy for size-stable operations #553

Conversation

MasonProtter commented May 4, 2023

Current State

This PR

codecov bot commented May 4, 2023 • edited Loading

Codecov Report

Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations #553

Add fast pathway for `copy`, `collect`, `tcollect`, and `tcopy` for size-stable operations #553

codecov bot commented May 4, 2023 •

edited

Loading