Lower aborts (incl. panics) to "return from entry-point", instead of infinite loops. #1070

eddyb · 2023-06-05T21:05:05Z

We currently map the abort intrinsic (used almost exclusively for panic) to infinite loops, and they either:

are optimized out by spirv-opt or drivers (i.e. treated as UB)
- this obviously changes the semantics, and if a conditional panic protected against some UB, now that UB is unconditional and could affect more code
- worst case, unwanted side-effects could run (e.g. writes of corrupted values, to buffers)
are preserved by both spirv-opt and drivers, and cause a timeout when used
- see these previous discussions for what it take to intentionally cause this (cc @charles-r-earp):
  - DebugPrintf prevents panics' infinite loops from being (unsoundly) optimized away. #1048
  - Added barrier to panic loop. #1055
- however, this is only a reliable option cross-hardware only for compute-only Vulkan queues (not just compute shaders, but compute shaders on a non-graphical queue), where the timeout can't block other work
- worst case, one can easily accidentally hang their entire desktop if GPU compositing is involved
  - (yes I've done this to myself while trying out the barrier trick, no it wasn't fun)

With infinite loops being so terrible, I propose we move towards a "well-defined invocation exit" approach, where we keep the "abort" as a custom instruction (using our "extended instruction set", added in #1064), and then effectively emulate the semantics of OpTerminateInvocation for it by:

inlining any function that uses our custom Abort (either directly, or transitively through some functions it calls) - in the end, we should end up with Aborts only used directly from entry-points
in entry-points, we rewrite Aborts to a plain OpReturn (from the entry-point, i.e. exiting the invocation)
- we could potentially have a mode where we generate a debugPrintf call at this point, with the same inlining-aware "backtrace" we use elsewhere, so that the user gets some feedback if they have the validation layers enabled (and/or try to extract a panic message when we generate the abort in the first place, too)

This PR implements that proposal (but without any debugPrintf conveniences), and so far it seems to work great, but I haven't tested the performance impact (i.e. where before the infinite loops were optimized away, now we're seeing an actual cost to various e.g. bounds checks, that need to do something at all).

There are also other ways of implementing this, and we could do the Abort -> OpReturn rewriting very late (if we think it would be better than letting the SPIR-T structurizer see it), so there's some room to explore mitigations to perf issues, if they arise.

~~(I will leave this PR as draft until we're sure about the perf aspects)~~

repi

approving, but with the reservation that there are no major performance regressions with this on esp. AMD and Nvidia GPUs in ark. which I believe @VZout was going to try and verify?

…infinite loops.

VZout · 2023-06-08T13:10:04Z

No noticable perf difference in ark. near identical timings.

eddyb · 2023-06-08T13:57:35Z

Thanks for the confirmation! I will leave this open until I have anything else to land (as I'd rather not stack too many PRs), to give more time to anyone who may have perf-sensitive shaders (e.g. @charles-r-earp @pema99 @Shfty) to test it, but we're likely fine as-is based on what I'm seeing.

eddyb · 2023-07-07T05:33:25Z

Decided to merge this despite wanting to dig further into @pema99's usecase, where this PR does seem to have a perf impact - ideally we will move towards more flexibility, which will make comparisons easier.

eddyb requested review from repi and VZout June 5, 2023 21:05

eddyb mentioned this pull request Jun 5, 2023

rustup: update to nightly-2023-05-27. #1071

Merged

repi approved these changes Jun 8, 2023

View reviewed changes

eddyb force-pushed the custom-abort branch from bd23356 to e4e9d14 Compare June 8, 2023 12:38

Lower aborts (incl. panics) to "return from entry-point", instead of …

756ffc8

…infinite loops.

eddyb force-pushed the custom-abort branch from e4e9d14 to 756ffc8 Compare June 8, 2023 13:02

VZout approved these changes Jun 8, 2023

View reviewed changes

eddyb marked this pull request as ready for review June 8, 2023 13:10

eddyb merged commit ce8c3f8 into EmbarkStudios:main Jul 7, 2023

eddyb deleted the custom-abort branch July 7, 2023 05:33

This was referenced Jul 14, 2023

Add debugPrintf-based panic reporting, controlled via spirv_builder::ShaderPanicStrategy. #1080

Merged

Added barrier to panic loop. #1055

Closed

DebugPrintf prevents panics' infinite loops from being (unsoundly) optimized away. #1048

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower aborts (incl. panics) to "return from entry-point", instead of infinite loops. #1070

Lower aborts (incl. panics) to "return from entry-point", instead of infinite loops. #1070

eddyb commented Jun 5, 2023 •

edited

Loading

repi left a comment

VZout commented Jun 8, 2023

eddyb commented Jun 8, 2023

eddyb commented Jul 7, 2023

Lower aborts (incl. panics) to "return from entry-point", instead of infinite loops. #1070

Lower aborts (incl. panics) to "return from entry-point", instead of infinite loops. #1070

Conversation

eddyb commented Jun 5, 2023 • edited Loading

repi left a comment

Choose a reason for hiding this comment

VZout commented Jun 8, 2023

eddyb commented Jun 8, 2023

eddyb commented Jul 7, 2023

eddyb commented Jun 5, 2023 •

edited

Loading