[RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces #13582

timotheecour · 2020-03-04T08:06:17Z

refs [RFC] [performance] {.inline.} guarantees, --stacktrace:inline, and --ignoreInline RFCs#198 (does not close because more items left)
nim is now faster with ~~--stacktrace:on~~ the new option --stacktrace:noinline

example 1:

see example #13582 (comment) showing 3X speedup compared to default stacktraces, while producing identical stacktraces

example 2:

nim compiling itself. Less drastic speedup than example 1 because it relies less on inlined procs (unlike math heavy code), but still an appreciatable speedup than anyone working on debugging compiler may benefit from.

before PR:

after building nim with --stacktrace:on --opt:speed:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 26.5.s
after building nim with --stacktrace:on:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 52.s
after building nim with -d:release --stacktrace:on:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 18.s
after building nim with -d:danger --stacktrace:on:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 17.7s

after PR:

after building nim with --stacktrace:noinline --opt:speed:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 19.5.s (1.35X faster)
after building nim with --stacktrace:noinline:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 46.9.s (1.09X faster)
after building nim with -d:release --stacktrace:noinline:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 14.7.s (1.24X faster)
after building nim with -d:danger --stacktrace:noinline:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 14.3.s (1.23X faster)

and for comparison (unchanged before/after this PR):
after building nim with -d:danger --stacktrace:off:
./bin/nim c -o:/tmp/z01 --hints:off -f compiler/nim.nim: 8.5

links

see also -Og, refs https://forum.nim-lang.org/t/7848#49991
see also -d:nimStackTraceOverride --import:libbacktrace --debugger:native + --passc:"-fno-omit-frame-pointer -fno-optimize-sibling-calls", refs fix #9 stacktrace works if triggered by a signal, eg SIGSEGV by timotheecour · Pull Request #10 · status-im/nim-libbacktrace

alaviss · 2020-03-04T08:41:30Z

But why? Isn't the point of turning stacktrace on is to get stacktraces? And you referenced the wrong issue/PR I think.

If you want speed wouldn't you just push stacktrace off wherever needed?

timotheecour · 2020-03-04T09:37:31Z

And you referenced the wrong issue/PR I think.

thanks! fixed

But why?

makes default debug builds faster (or any build with --stacktrace:on), offering a better compromise between speed and debugging-ability

you loose nothing with this PR:

--stacktrace:inline is equivalent to --stacktrace:on before this PR
--stacktrace:off is equivalent to --stacktrace:off before this PR
--stacktrace:on now honors {.inline.} so they don't slow down performance; inlined procs is rarely what you want to debug as these correspond to, typically, performance sensitive code where inlining matters, including a bunch of mathematics operators l. But you can still debug these via --stacktrace:inline

Clyybber · 2020-03-04T10:03:51Z

you loose nothing with this PR:

Sure. We loose sane defaults IMO.

timotheecour · 2020-03-04T10:09:49Z

Sure. We loose sane defaults IMO.

cmon, it's just a matter of flipping the roles of the --stacktrace flags eg this could also be done with very little change from current state of this PR:

--stacktrace:off # never
--stacktrace:on # always
--stacktrace:noinline # only for procs not marked {.inline.}

so your "sane" defaults are preserved that way. I prefer the way I wrote it but willing to change to these if there's consensus

Clyybber · 2020-03-04T10:23:37Z

But this new behaviour is a step back. If you want performance at the cost of stacktraces use {.push: stacktraces: off.}.
No need to introduce surprising and more complex behaviour.

timotheecour · 2020-03-04T15:49:55Z

I've changed the default so that nothing changes except if you pass the new flag --stacktrace:noinline, so there is no "step back"

No need to introduce surprising and more complex behaviour.

there's nothing surprising or complex, this matches how native stacktraces work!

keep using the flags as you did and there will be zero difference vs before this PR.
or use the new --stacktrace:noinline and you get a nice speedup in lots of cases, whether you use it in combination with -d:danger or for debug builds.

Here's an example: you get a 3X speedup compared to --stacktrace:on, and you get the exact same stacktrace here (but this is common case; the things we're inlining are hotspots, well debugged, and unlikely to be where crashes or anything interesting happens).

import tables, times
var tab: Table[int, int]
proc main() =
  let n = 1000
  for i in 0..<n: tab[i] = i # doAssert i < n-1 # insert this to crash and show stacktrace
let t = cpuTime()
for i in 0..<100000: main()
echo cpuTime() - t

nim c -d:danger -r --stacktrace:off main 0.490725 s
nim c -d:danger -r --stacktrace:on main 2.041365 s
nim c -d:danger -r --stacktrace:noinline main 0.658695 s

likewise, we get a 3X speedup between nim c --opt:speed -r --stacktrace:on main and nim c --opt:speed -r --stacktrace:inline main

as for the stacktraces: insert the doAssert line:

with --stacktrace:off you get no stacktrace
with --stacktrace:noinline you get the exact same stacktrace as for --stacktrace:on

in the rare cases where the thing that crashed is inside a {.inline.} proc, just re-run with --stacktrace:on

If you want performance at the cost of stacktraces use {.push: stacktraces: off.}.

as I already explained, this is a worse solution: you'll end up having to find where bottlenecks are, modify sources, rerun, and modify sources back, and it'll affect everything inside those push/pop instead of just the inline bits.

It's the exact analog as why we have both -d:danger and -d:release:

-d:release is a good compromise between safety and speed
--stacktrace:inline is a good compromise between debuggability and speed
(and these can be combined, eg -d:danger --stacktrace:inline)

Clyybber · 2020-03-04T16:01:02Z

We already have -d:danger.
This makes everything just more complicated, the next issue could then be "allow overwriting --stacktrace:noinline with {.push: stacktrace: on.}".
Its a rabbit hole that just doesn't need opening IMO, when you use the features nim already provides you with to accomplish what you need.

…ce:on, but do with --stacktrace:inline

stale · 2021-03-09T06:04:26Z

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

Varriount · 2021-04-09T07:38:49Z

So I'm a bit confused about the objections here - aside from what I see to be an internal optimization and a slight CLI change, what is the downside to this?

Yes, one can turn stack traces off or inline functions using a pragma, but one can also write their own stack traces manually and inline functions by hand. To me (at least, currently), this seems like an improvement with little in the way of a downside.

Araq · 2021-04-09T07:59:51Z

Advantages:

Simple idea and implementation.

Disadvantages:

Yet another switch.
There are already other ways to speed up strack tracking for production use, based on libunwind or whatever it's called.

Personally I would rather see the effort spend on nlvm, which has the potential to give us really decent debugging support.

Varriount · 2021-04-09T18:00:27Z

@Araq Regarding "yet another switch", couldn't this just be made the default implementation? I also can't find any mention of libUnwind in the repository. If it isn't currently implemented, then (presumably) an implementation would require a switch anyway.

stale · 2022-04-27T12:50:50Z

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

timotheecour changed the title ~~refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline~~ [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline Mar 4, 2020

timotheecour changed the title ~~[performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline~~ [WIP] [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline Mar 4, 2020

timotheecour changed the title ~~[RFC] [WIP] [performance] refs #198 {.inline.} procs now do not generate frames with --stacktrace:on, but do with --stacktrace:inline~~ [RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with same stacktraces typically Mar 4, 2020

timotheecour changed the title ~~[RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with same stacktraces typically~~ [RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with typically same stacktraces Mar 4, 2020

timotheecour changed the title ~~[RFC] --stacktrace:inline skips {.inline.} procs, up to 3X speedups with typically same stacktraces~~ [RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces Mar 4, 2020

timotheecour mentioned this pull request Mar 4, 2020

[RFC] [performance] {.inline.} guarantees, --stacktrace:inline, and --ignoreInline nim-lang/RFCs#198

Open

7 tasks

timotheecour added 4 commits March 5, 2020 01:06

remove deadcode N_INLINE_PTR

d4919db

refs #198 {.inline.} procs now do not generate frames with --stacktra…

9655c6b

…ce:on, but do with --stacktrace:inline

use --stacktrace:on,off,noinline to preserve default behavior

bafa7a1

document where seemingly dead-code N_INLINE_PTR is actually used

ac0e77d

timotheecour force-pushed the pr_fix_inline_debug_refs_198 branch from 3892088 to ac0e77d Compare March 5, 2020 09:07

timotheecour mentioned this pull request Mar 6, 2020

fix operators containing percent for VM usage #13536

Merged

stale bot added the stale Staled PR/issues; remove the label after fixing them label Mar 9, 2021

stale bot closed this Apr 8, 2021

timotheecour reopened this Apr 9, 2021

stale bot removed the stale Staled PR/issues; remove the label after fixing them label Apr 9, 2021

timotheecour marked this pull request as draft April 9, 2021 06:39

stale bot added the stale Staled PR/issues; remove the label after fixing them label Apr 27, 2022

stale bot closed this May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces #13582

[RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces #13582

timotheecour commented Mar 4, 2020 •

edited

Loading

alaviss commented Mar 4, 2020 •

edited

Loading

timotheecour commented Mar 4, 2020 •

edited

Loading

Clyybber commented Mar 4, 2020 •

edited

Loading

timotheecour commented Mar 4, 2020 •

edited

Loading

Clyybber commented Mar 4, 2020 •

edited

Loading

timotheecour commented Mar 4, 2020 •

edited

Loading

Clyybber commented Mar 4, 2020

stale bot commented Mar 9, 2021

Varriount commented Apr 9, 2021

Araq commented Apr 9, 2021 •

edited

Loading

Varriount commented Apr 9, 2021

stale bot commented Apr 27, 2022

[RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces #13582

[RFC] --stacktrace:noinline: up to 3X speedups with typically same stacktraces #13582

Conversation

timotheecour commented Mar 4, 2020 • edited Loading

example 1:

example 2:

before PR:

after PR:

links

alaviss commented Mar 4, 2020 • edited Loading

timotheecour commented Mar 4, 2020 • edited Loading

Clyybber commented Mar 4, 2020 • edited Loading

timotheecour commented Mar 4, 2020 • edited Loading

Clyybber commented Mar 4, 2020 • edited Loading

timotheecour commented Mar 4, 2020 • edited Loading

Clyybber commented Mar 4, 2020

stale bot commented Mar 9, 2021

Varriount commented Apr 9, 2021

Araq commented Apr 9, 2021 • edited Loading

Varriount commented Apr 9, 2021

stale bot commented Apr 27, 2022

timotheecour commented Mar 4, 2020 •

edited

Loading

alaviss commented Mar 4, 2020 •

edited

Loading

timotheecour commented Mar 4, 2020 •

edited

Loading

Clyybber commented Mar 4, 2020 •

edited

Loading

timotheecour commented Mar 4, 2020 •

edited

Loading

Clyybber commented Mar 4, 2020 •

edited

Loading

timotheecour commented Mar 4, 2020 •

edited

Loading

Araq commented Apr 9, 2021 •

edited

Loading