noncliff: improve performance of `_allthreesumtozero` #377

Fe-r-oz · 2024-10-04T03:04:33Z

This PR aims too improve the performance of _allthreesumtozero. Benchmarks are attached below:
As you mentioned some time ago, to prevent from falling into trap of misleading results, I used evals = 1 and setups

Benchmarks:

julia> using BenchmarkTools
julia> function _allthreesumtozero(a, b, c)
           @inbounds @simd for i in 1:length(a)
               iseven(a[i] + b[i] + c[i]) || return false
           end
           true
       end

julia> N = 10^6;

julia> a = rand(1:100, N);
julia> b = rand(1:100, N);
julia> c = rand(1:100, N);

julia> @benchmark _allthreesumtozero($a, $b, $c) evals=1 setup=(a_copy=copy(a); b_copy=copy(b); c_copy=copy(c))
BenchmarkTools.Trial: 633 samples with 1 evaluation.
 Range (min … max):  330.000 ns …  19.116 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     943.000 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.086 μs ± 933.113 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █▃▆█▇▄▄▁ ▄▃▃▃▅▇▂ ▂▁▃  ▁                                       
  █████████▄███████████▇▇█▆▆█▆▇▇▇▆▇▅▆▄▄▅▃▃▁▃▅▃▃▂▃▂▂▂▂▂▁▃▂▃▂▁▁▁▃ ▅
  330 ns           Histogram: frequency by time         2.89 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> # Optimization
  function _allthreesumtozero_optimized(a, b, c)
       n = length(a)
       @inbounds @simd for i in 1:n
            odd = (a[i]+b[i]+c[i]) & 1
            if odd != 0
                return false
            end
       end
       true
  end

julia> @benchmark _allthreesumtozero_optimized($a, $b, $c) evals=1 setup=(a_copy=copy(a); b_copy=copy(b); c_copy=copy(c))
BenchmarkTools.Trial: 650 samples with 1 evaluation.
 Range (min … max):  234.000 ns … 17.180 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     590.500 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   785.857 ns ±  1.052 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▃▄▇ ▅   ▁   ▁                                               
  ███████▇███▆█▄█▆▄▆▄▃▄▄▄▄▄▃▂▄▂▃▃▁▃▂▁▁▂▂▁▁▁▁▂▂▂▁▁▁▁▁▂▂▁▁▁▁▁▁▁▃ ▃
  234 ns          Histogram: frequency by time         3.12 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

The code is properly formatted and commented.
Substantial new functionality is documented within the docs.
All new functionality is tested.
All of the automated tests on github pass.

Fe-r-oz · 2024-10-04T04:05:35Z

I think the PR is ready for review. Thanks for your tip about using evals=1 and setup from some time ago, that helped me from not producing misleading results... Thank you!

src/nonclifford.jl

Krastanov · 2024-10-25T02:38:03Z

looks great, thanks!

improve performance of _allthreesumtozero

a094a6f

Fe-r-oz mentioned this pull request Oct 9, 2024

Completing the non-Clifford capabilities [$800] #309

Open

Krastanov reviewed Oct 25, 2024

View reviewed changes

src/nonclifford.jl Outdated Show resolved Hide resolved

Update src/nonclifford.jl

61d5666

Krastanov merged commit 6863bd0 into QuantumSavory:nonclif Oct 25, 2024
8 of 12 checks passed

Fe-r-oz deleted the todo branch October 25, 2024 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

noncliff: improve performance of `_allthreesumtozero` #377

noncliff: improve performance of `_allthreesumtozero` #377

Fe-r-oz commented Oct 4, 2024 •

edited

Loading

Fe-r-oz commented Oct 4, 2024

Krastanov commented Oct 25, 2024

noncliff: improve performance of _allthreesumtozero #377

noncliff: improve performance of _allthreesumtozero #377

Conversation

Fe-r-oz commented Oct 4, 2024 • edited Loading

Fe-r-oz commented Oct 4, 2024

Krastanov commented Oct 25, 2024

noncliff: improve performance of `_allthreesumtozero` #377

noncliff: improve performance of `_allthreesumtozero` #377

Fe-r-oz commented Oct 4, 2024 •

edited

Loading