-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fleshing out Execution and Memory models #94
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this doc! I left some wording suggestions and comments that are hopefully helpful.
This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.
f39311b
to
3e3fc3e
Compare
5965e2f
to
bbaa286
Compare
We don't really have direct host-memory manipulation today in HLSL, so let's keep this simple.
I've put in a bunch of word smithing and updates. I think this is mostly in sane state now. |
@Keenuts, the last change I added here is in response to the discussion we had in email this week. Do you think that wording correctly addresses the concerns? |
Sure, this block as-is makes sense: no optimization should change the program behavior (except optimizations like fast-math? But maybe that's considered to be a codegen switch, not an optimization switch?) What is IMO the important missing bit (but it's a whole subject) is to define what is expected for actual execution: the spec mentions lock-step is not mandatory, hence the spec as-is is allows the same CFG/shader to yield 2 different results. Example: running on Maxwell and Volta architecture, with the latter allowing not reconverging directly after a divergence. |
Within the spec you can't change the program behavior outside what is specified. HLSL has fast-math by default, so that will be the specified behavior for floating point math operations.
Lockstep isn't strictly required by the spec for anything. Even wave operations can be emulated without lockstep execution if you have appropriate synchronization primitives. For Volta, the expectation is that wave operations will effectively require warp synchronization. I think that's a driver problem for Nvidia more than anything else. Maybe that's the missing explicit wording. I can add something like:
|
Yes, and where those barriers are put should probably be described in the control-flow structures (here using Cuda's syntax mixed with HLSL)
|
I don't think this is correct. Every |
Yes, of course. If we add a
But It seems this goes beyond this initial PR, so probably not the best place to dive into this. |
Yea, I think some of this will come down to the wording as we write the behavior of switch statements and case labels, but at the same time I don't think we need to define synchronization for control flow structures. It isn't illegal to reshape control flow during optimization if it doesn't change program behavior, so synchronization of execution isn't really a property of the control flow structures. Even within the definitions of C, it is illegal to merge and hoist control flow if it changes the resulting execution. C doesn't have the SIMT/SPMD considerations that HLSL does, but we can lean on similar rules for HLSL and get the expected behavior. The C spec defined case statements from ANSI C through until C17 (the wording is slightly different in the latest draft of C23) with the following statement:
If that's the basis for how I don't want our spec to make statements about expectations for control flow structure that aren't required because we do want conforming compilers to be allowed to optimize in cases where they can. |
I've marked the outstanding conversations as resolved. I think that I've addressed all the feedback. I'm happy to revisit anything in subsequent PRs. I'm sure this will all need some more work as we get further along. |
This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.
This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.
The PDF draft of this change is available here.