Skip to content

@batch slows down other non-@batched loops with allocations on macOS ARM #89

Closed
@efaulhaber

Description

@efaulhaber

Some of my simulations are regularly stopping for about a second when using @batch on macOS ARM.
I could reduce this problem to this minimal example, but I am now clueless how to continue.

using Polyester


function with_batch()
    # Just some loop with @batch with basically no runtime
    @batch for i in 1:2
        nothing
    end

    # This is just to make sure that the allocation in the next loop is not optimized away
    v = [[]]

    # Note that there is no @batch here
    for i in 1:1000
        # Just an allocation
        v[1] = []
    end
end

function without_batch()
    for i in 1:2
        nothing
    end

    v = [[]]

    for i in 1:1000
        v[1] = []
    end
end

Benchmarking yields the following:

julia> @benchmark with_batch()
BenchmarkTools.Trial: 8709 samples with 1 evaluation.
 Range (min … max):   16.416 μs …   1.404 s  ┊ GC (min … max): 0.00% … 0.47%
 Time  (median):      18.041 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   663.460 μs ± 30.068 ms  ┊ GC (mean ± σ):  0.41% ± 0.01%

         ▁▁▄▇█▇▅▃▁▁ ▁▂▂▃▄▄▄▂▃▃▃▃▃▃▄▄▃▄▃▂▁      ▁               ▂
  ▂▂▃▅▅▇███████████████████████████████████████████████████▇▆▆ █
  16.4 μs       Histogram: log(frequency) by time      23.6 μs <

 Memory estimate: 46.98 KiB, allocs estimate: 1002.

julia> @benchmark without_batch()
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  14.625 μs …   5.596 ms  ┊ GC (min … max):  0.00% … 99.31%
 Time  (median):     15.166 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   18.275 μs ± 110.414 μs  ┊ GC (mean ± σ):  12.03% ±  1.98%

     █▇▃                                                        
  ▂▂▅███▆▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▅▃▂▂▃▂▂▂▂▂▂▃▅▆▄▃▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  14.6 μs         Histogram: frequency by time         19.8 μs <

 Memory estimate: 46.98 KiB, allocs estimate: 1002.

About one execution out of 2000 takes over one second, which causes the mean to be 30x higher than without any @batch loops. This is consistent with what I see in simulations, where most time steps are fast, but then some take over a second.

This problem is specific to macOS ARM. The same Julia version on an x86 machine works as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions