Benchmark against 1.0.0 for potential 1.1.0 release #30218

KristofferC · 2018-11-30T20:41:19Z

@nanosoldier runbenchmarks(ALL, vs = ":release-1.0")

Not sure if we should also run against 1.0.0?

nanosoldier · 2018-12-01T03:46:16Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

KristofferC · 2018-12-03T15:19:21Z

Stuff to look at:

["array", "index", "(\"sumcartesian_view\", \"100000:-1:1\")"] | 61052.43 (50%) ❌ | 1.00 (1%)
["array", "index", "(\"sumcartesian_view\", \"1:100000\")"] | 535.94 (50%) ❌ | 1.00 (1%)

["array", "index", "(\"sumeach_view\", \"100000:-1:1\")"] | 58277.14 (50%) ❌ | 1.00 (1%)
["array", "index", "(\"sumeach_view\", \"1:100000\")"] | 642.67 (50%) ❌ | 1.00 (1%)

["array", "index", "(\"sumlinear_view\", \"100000:-1:1\")"] | 61051.76 (50%) ❌ | 1.00 (1%)
["array", "index", "(\"sumlinear_view\", \"1:100000\")"] | 567.00 (50%) ❌ | 1.00 (1%)


["array", "setindex!", "(\"setindex!\", 1)"] | 2.24 (5%) ❌ | 1.00 (1%)
["array", "setindex!", "(\"setindex!\", 2)"] | 2.17 (5%) ❌ | 1.00 (1%)
["array", "setindex!", "(\"setindex!\", 3)"] | 2.15 (5%) ❌ | 1.00 (1%)
["array", "setindex!", "(\"setindex!\", 4)"] | 2.19 (5%) ❌ | 1.00 (1%)

Fixed by #30248

["micro", "printfd"] | 2.07 (5%) ❌ | 1.72 (1%) ❌

["misc", "iterators", "zip(1:1, 1:1, 1:1, 1:1)"] | 3.09 (5%) ❌ | 2.43 (1%) ❌

@martinholters ?

martinholters · 2018-12-03T16:53:24Z

I won't have time to look into the zip thing this week, but please ping me next week if needed.

StefanKarpinski · 2018-12-03T17:48:01Z

@mbauman: can you take a look at the array indexing regressions?

KristofferC · 2018-12-03T17:49:57Z

They are most likely a case where before LLVM did the math and computed the answer without needing to loop while now, perhaps it can't see through the view abstraction.

KristofferC · 2018-12-04T01:38:57Z

Regarding

["micro", "printfd"] | 2.07 (5%) ❌ | 1.72 (1%) ❌

this was caused by #29907

Before reverting

  1.410 ms (10 allocations: 672 bytes)

After reverting:

717.883 μs (10 allocations: 672 bytes)

I thought task-local storage might be too slow for this, but fortunately the slowdown is only about 5-6%.

cc @JeffBezanson

KristofferC · 2018-12-04T01:52:44Z

Regarding

["array", "index", "(\"sumcartesian_view\", \"100000:-1:1\")"] | 61052.43 (50%) ❌ | 1.00 (1%)

and co,. on 1.0.2 the compiler can indeed do the arithmetic for e.g.

function perf_sumcartesian_view(A)
    s = zero(eltype(A))
    @inbounds @simd for I in CartesianIndices(size(A))
        val = view(A, I)
        s += val[]
    end
    return s
end

A = 1:100000000

while on master, it generates beautifully vectorized code, but obviously, working hard doesn't beat being smart.

mbauman · 2018-12-04T02:33:09Z

I can kick off a bisect for that one.

KristofferC · 2018-12-04T02:51:10Z

I am doing that right now :)

KristofferC · 2018-12-04T04:16:29Z

Regarding

["array", "index", "(\"sumcartesian_view\", \"100000:-1:1\")"] | 61052.43 (50%) ❌ | 1.00 (1%)

Regression introduced in #29895 cc @mfsch

mbauman · 2018-12-04T04:48:18Z

Regression introduced in #29895 (no Nanosoldier run)

Dang, that is quite the surprise but my bisect agrees. No Nanosoldier run because, well, I don't have a mental model for how such a change would regress anything. ~~Further, I don't think we have any BaseBenchmarks for scalar views, which is what that change was targeting.~~ No need to CC the author (a first-time committer); @mfsch you should not worry about your PR or feel obligated to address this. Heck, I'm not sure how to address it — it's a great change that I want to keep.

Are we going over some magical number of methods? Or a type complexity heuristic?

chethega · 2018-12-04T18:32:48Z

I don't really get why the bisected PR impacts that, but FYI the fast version uses an O(1) summation algorithm (explicit formula) instead of vectorized code.

It is pretty impressive that llvm sometimes replaces reductions over integer ranges by explicit formulas. But I don't think that is a realistic case to worry about: People should never rely on compiler optimizations for complexity class. In this case O(1) vs O(N) for computing sum(1:N) == ( N*(N+1) )>>1.

KristofferC · 2018-12-04T18:51:08Z

People should never rely on compiler optimizations for complexity class. In this case O(1) vs O(N) for computing sum(1:N) == ( N*(N+1) )>>1.

While true, the regression here means that LLVM understand less about our SubArrays which can likely have effect in other contexts than just changing an O(n) to O(1).

mbauman · 2018-12-04T18:54:39Z

Yeah, these indexing benchmarks have always sat on the knife's edge of O(1)-ization, but it's been a good stress test to ensure that we have the complicated indexing machinery as fast as we can make it and expressed in a manner that LLVM likes.

These regressions actually are all scalar views — I was wrong about not having benchmarks for this case. That PR shifts which methods get defined for 0-d views: the method SubArray implements shifts from being v[] to being v[1]… but the benchmarks call v[], so now our fallback indexing machinery has to insert that 1 for us and subarray has to do an addition it didn't need to do anymore. I wonder if full Cartesian indexing on "Fast" SubArrays via re-indexing would be faster (or equivalent) to doing the linearization up-front. Or perhaps we have something that's not quite optimal in the fallbacks.

martinholters · 2018-12-10T10:38:30Z

["misc", "iterators", "zip(1:1, 1:1, 1:1, 1:1)"] | 3.09 (5%) ❌ | 2.43 (1%) ❌

Bisected to 1324ceb (#28284). I'll try to figure out what the problem is there, but can't promise anything...

EDIT: See #30331.

JeffBezanson · 2018-12-11T22:52:08Z

I'll work on the printf regression. I believe the problem is that printf re-fetches the task-local buffer many times, and it should instead be saved in a local variable.

mostly fixes the regression identified in #30218

mostly fixes the regression identified in #30218 (cherry picked from commit e836937)

KristofferC · 2018-12-13T20:44:15Z

@nanosoldier runbenchmarks(ALL, vs = ":release-1.0")

nanosoldier · 2018-12-14T03:46:36Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

KristofferC · 2018-12-14T04:28:05Z

Ok, microfd is only at 18% regression now, and the one remaining is the optimizations for the scalar view which, from what I understand, we will accept. In that case, benchmarking looks good.

Update VERSION

232e548

KristofferC added the status:DO NOT MERGE Do not merge this PR! label Nov 30, 2018

KristofferC mentioned this pull request Dec 3, 2018

fix perf regression from not specializing on iterate on tuples #29587

Closed

mbauman mentioned this pull request Dec 4, 2018

Try implementing N-dimensional indexing for fast linear SubArrays #30266

Merged

JeffBezanson added this to the 1.1 milestone Dec 7, 2018

martinholters mentioned this pull request Dec 10, 2018

Force specialization on the type argument of _similar_for #30331

Merged

JeffBezanson added a commit that referenced this pull request Dec 12, 2018

improve printf performance by passing digit buffer around

ec868d3

mostly fixes the regression identified in #30218

JeffBezanson mentioned this pull request Dec 12, 2018

improve printf performance by passing digit buffer around #30373

Merged

KristofferC changed the base branch from master to backport-1.1.0 December 13, 2018 00:14

KristofferC pushed a commit that referenced this pull request Dec 13, 2018

improve printf performance by passing digit buffer around (#30373)

e836937

mostly fixes the regression identified in #30218

KristofferC pushed a commit that referenced this pull request Dec 13, 2018

improve printf performance by passing digit buffer around (#30373)

7cbac07

mostly fixes the regression identified in #30218 (cherry picked from commit e836937)

JeffBezanson closed this Dec 17, 2018

ararslan deleted the KristofferC-patch-7 branch December 17, 2018 20:54

DilumAluthge removed the status:DO NOT MERGE Do not merge this PR! label Jun 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark against 1.0.0 for potential 1.1.0 release #30218

Benchmark against 1.0.0 for potential 1.1.0 release #30218

KristofferC commented Nov 30, 2018

nanosoldier commented Dec 1, 2018

KristofferC commented Dec 3, 2018 •

edited

Loading

martinholters commented Dec 3, 2018

StefanKarpinski commented Dec 3, 2018

KristofferC commented Dec 3, 2018

KristofferC commented Dec 4, 2018 •

edited

Loading

KristofferC commented Dec 4, 2018 •

edited

Loading

mbauman commented Dec 4, 2018

KristofferC commented Dec 4, 2018

KristofferC commented Dec 4, 2018 •

edited

Loading

mbauman commented Dec 4, 2018 •

edited

Loading

chethega commented Dec 4, 2018

KristofferC commented Dec 4, 2018 •

edited

Loading

mbauman commented Dec 4, 2018

martinholters commented Dec 10, 2018 •

edited

Loading

JeffBezanson commented Dec 11, 2018

KristofferC commented Dec 13, 2018

nanosoldier commented Dec 14, 2018

KristofferC commented Dec 14, 2018

Benchmark against 1.0.0 for potential 1.1.0 release #30218

Benchmark against 1.0.0 for potential 1.1.0 release #30218

Conversation

KristofferC commented Nov 30, 2018

nanosoldier commented Dec 1, 2018

KristofferC commented Dec 3, 2018 • edited Loading

martinholters commented Dec 3, 2018

StefanKarpinski commented Dec 3, 2018

KristofferC commented Dec 3, 2018

KristofferC commented Dec 4, 2018 • edited Loading

KristofferC commented Dec 4, 2018 • edited Loading

mbauman commented Dec 4, 2018

KristofferC commented Dec 4, 2018

KristofferC commented Dec 4, 2018 • edited Loading

mbauman commented Dec 4, 2018 • edited Loading

chethega commented Dec 4, 2018

KristofferC commented Dec 4, 2018 • edited Loading

mbauman commented Dec 4, 2018

martinholters commented Dec 10, 2018 • edited Loading

JeffBezanson commented Dec 11, 2018

KristofferC commented Dec 13, 2018

nanosoldier commented Dec 14, 2018

KristofferC commented Dec 14, 2018

KristofferC commented Dec 3, 2018 •

edited

Loading

KristofferC commented Dec 4, 2018 •

edited

Loading

KristofferC commented Dec 4, 2018 •

edited

Loading

KristofferC commented Dec 4, 2018 •

edited

Loading

mbauman commented Dec 4, 2018 •

edited

Loading

KristofferC commented Dec 4, 2018 •

edited

Loading

martinholters commented Dec 10, 2018 •

edited

Loading