Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lower aborts (incl. panics) to "return from entry-point", instead of infinite loops. #1070

Merged
merged 1 commit into from
Jul 7, 2023

Conversation

eddyb
Copy link
Contributor

@eddyb eddyb commented Jun 5, 2023

We currently map the abort intrinsic (used almost exclusively for panic) to infinite loops, and they either:

  • are optimized out by spirv-opt or drivers (i.e. treated as UB)
    • this obviously changes the semantics, and if a conditional panic protected against some UB, now that UB is unconditional and could affect more code
    • worst case, unwanted side-effects could run (e.g. writes of corrupted values, to buffers)
  • are preserved by both spirv-opt and drivers, and cause a timeout when used

With infinite loops being so terrible, I propose we move towards a "well-defined invocation exit" approach, where we keep the "abort" as a custom instruction (using our "extended instruction set", added in #1064), and then effectively emulate the semantics of OpTerminateInvocation for it by:

  • inlining any function that uses our custom Abort (either directly, or transitively through some functions it calls) - in the end, we should end up with Aborts only used directly from entry-points
  • in entry-points, we rewrite Aborts to a plain OpReturn (from the entry-point, i.e. exiting the invocation)
    • we could potentially have a mode where we generate a debugPrintf call at this point, with the same inlining-aware "backtrace" we use elsewhere, so that the user gets some feedback if they have the validation layers enabled (and/or try to extract a panic message when we generate the abort in the first place, too)

This PR implements that proposal (but without any debugPrintf conveniences), and so far it seems to work great, but I haven't tested the performance impact (i.e. where before the infinite loops were optimized away, now we're seeing an actual cost to various e.g. bounds checks, that need to do something at all).

There are also other ways of implementing this, and we could do the Abort -> OpReturn rewriting very late (if we think it would be better than letting the SPIR-T structurizer see it), so there's some room to explore mitigations to perf issues, if they arise.

(I will leave this PR as draft until we're sure about the perf aspects)

Copy link
Contributor

@repi repi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, but with the reservation that there are no major performance regressions with this on esp. AMD and Nvidia GPUs in ark. which I believe @VZout was going to try and verify?

@VZout
Copy link
Member

VZout commented Jun 8, 2023

No noticable perf difference in ark. near identical timings.

@eddyb eddyb marked this pull request as ready for review June 8, 2023 13:10
@eddyb
Copy link
Contributor Author

eddyb commented Jun 8, 2023

Thanks for the confirmation! I will leave this open until I have anything else to land (as I'd rather not stack too many PRs), to give more time to anyone who may have perf-sensitive shaders (e.g. @charles-r-earp @pema99 @Shfty) to test it, but we're likely fine as-is based on what I'm seeing.

@eddyb
Copy link
Contributor Author

eddyb commented Jul 7, 2023

Decided to merge this despite wanting to dig further into @pema99's usecase, where this PR does seem to have a perf impact - ideally we will move towards more flexibility, which will make comparisons easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants