Fleshing out Execution and Memory models #94

llvm-beanz · 2023-09-07T19:32:57Z

This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.

The PDF draft of this change is available here.

jeremyong

Thanks for working on this doc! I left some wording suggestions and comments that are hopefully helpful.

specs/language/introduction.tex

This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.

We don't really have direct host-memory manipulation today in HLSL, so let's keep this simple.

llvm-beanz · 2023-12-01T16:27:54Z

I've put in a bunch of word smithing and updates.

I think this is mostly in sane state now.

Updated PDF

specs/language/introduction.tex

Thank you!

llvm-beanz · 2024-02-22T16:00:56Z

@Keenuts, the last change I added here is in response to the discussion we had in email this week. Do you think that wording correctly addresses the concerns?

specs/language/introduction.tex

Keenuts · 2024-02-22T16:19:51Z

@Keenuts, the last change I added here is in response to the discussion we had in email this week. Do you think that wording correctly addresses the concerns?

Sure, this block as-is makes sense: no optimization should change the program behavior (except optimizations like fast-math? But maybe that's considered to be a codegen switch, not an optimization switch?)

What is IMO the important missing bit (but it's a whole subject) is to define what is expected for actual execution: the spec mentions lock-step is not mandatory, hence the spec as-is is allows the same CFG/shader to yield 2 different results. Example: running on Maxwell and Volta architecture, with the latter allowing not reconverging directly after a divergence.

llvm-beanz · 2024-02-22T17:02:03Z

Sure, this block as-is makes sense: no optimization should change the program behavior (except optimizations like fast-math? But maybe that's considered to be a codegen switch, not an optimization switch?)

Within the spec you can't change the program behavior outside what is specified. HLSL has fast-math by default, so that will be the specified behavior for floating point math operations.

What is IMO the important missing bit (but it's a whole subject) is to define what is expected for actual execution: the spec mentions lock-step is not mandatory, hence the spec as-is is allows the same CFG/shader to yield 2 different results. Example: running on Maxwell and Volta architecture, with the latter allowing not reconverging directly after a divergence.

Lockstep isn't strictly required by the spec for anything. Even wave operations can be emulated without lockstep execution if you have appropriate synchronization primitives.

For Volta, the expectation is that wave operations will effectively require warp synchronization. I think that's a driver problem for Nvidia more than anything else. Maybe that's the missing explicit wording.

I can add something like:

Wave, Quad, and Thread Group operations require execution synchronization of applicable active and helper Lanes as defined by the individual operation.

Keenuts · 2024-02-22T19:36:18Z

For Volta, the expectation is that wave operations will effectively require warp synchronization. I think that's a driver problem for Nvidia more than anything else. Maybe that's the missing explicit wording.

I can add something like:

Wave, Quad, and Thread Group operations require execution synchronization of applicable active and helper Lanes as defined by the individual operation.

Yes, and where those barriers are put should probably be described in the control-flow structures (here using Cuda's syntax mixed with HLSL)

switch (value) {
    case 0:
        WaveActiveSum(1);
   default:
       __syncwrap(MASK_FOR_ALL_THREADS);
       WaveActiveSum(10);
}

llvm-beanz · 2024-02-23T00:18:06Z

switch (value) {
    case 0:
        WaveActiveSum(1);
   default:
       __syncwrap(MASK_FOR_ALL_THREADS);
       WaveActiveSum(10);
}

I don't think this is correct. Every Wave* intrinsic needs to have a __syncwarp associated with it. In your example it may have only been required for the default: case, but in a more complicated example they're effectively needed for all Wave and Quad operations with a mask that represents all the participating lanes.

Keenuts · 2024-02-26T11:57:29Z

switch (value) {
    case 0:
        WaveActiveSum(1);
   default:
       __syncwrap(MASK_FOR_ALL_THREADS);
       WaveActiveSum(10);
}
I don't think this is correct. Every Wave* intrinsic needs to have a __syncwarp associated with it. In your example it may have only been required for the default: case, but in a more complicated example they're effectively needed for all Wave and Quad operations with a mask that represents all the participating lanes.

Yes, of course. If we add a case 1 which doesn't branch, then the mask has to be different, what I meant with this is the control-flow structure itself needs to define where synchronization is expected, not just the intrinsic:

we both agree the we must synchronize as soon as we can, when we fall-through. But this has to be made explicit IMO, and it's related to the control flow construct.
(Same for loops breaks, etc).

But It seems this goes beyond this initial PR, so probably not the best place to dive into this.

llvm-beanz · 2024-02-26T16:01:55Z

Yea, I think some of this will come down to the wording as we write the behavior of switch statements and case labels, but at the same time I don't think we need to define synchronization for control flow structures.

It isn't illegal to reshape control flow during optimization if it doesn't change program behavior, so synchronization of execution isn't really a property of the control flow structures. Even within the definitions of C, it is illegal to merge and hoist control flow if it changes the resulting execution. C doesn't have the SIMT/SPMD considerations that HLSL does, but we can lean on similar rules for HLSL and get the expected behavior.

The C spec defined case statements from ANSI C through until C17 (the wording is slightly different in the latest draft of C23) with the following statement:

Labels in themselves do not alter the flow of control, which continues unimpeded across them.

If that's the basis for how case labels are defined, and Wave* operations are defined to have wave sync behavior, I think that defines the expected behavior completely.

I don't want our spec to make statements about expectations for control flow structure that aren't required because we do want conforming compilers to be allowed to optimize in cases where they can.

specs/language/introduction.tex

llvm-beanz · 2024-04-09T01:33:03Z

I've marked the outstanding conversations as resolved. I think that I've addressed all the feedback. I'm happy to revisit anything in subsequent PRs. I'm sure this will all need some more work as we get further along.

This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.

llvm-beanz requested a review from bogner September 7, 2023 19:32

jeremyong reviewed Sep 9, 2023

View reviewed changes

Fleshing out Execution and Memory models

3e3fc3e

This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.

llvm-beanz force-pushed the cbieneman/terminology branch from f39311b to 3e3fc3e Compare December 1, 2023 15:44

Add subsection on lane state

bbaa286

llvm-beanz force-pushed the cbieneman/terminology branch from 5965e2f to bbaa286 Compare December 1, 2023 16:14

llvm-beanz added 2 commits December 1, 2023 10:17

Remove bit about Host Memory

1823e18

We don't really have direct host-memory manipulation today in HLSL, so let's keep this simple.

Cleaning up wording around lane states

ef12580

dmpots approved these changes Dec 1, 2023

View reviewed changes

specs/language/introduction.tex Outdated Show resolved Hide resolved

specs/language/introduction.tex Show resolved Hide resolved

specs/language/introduction.tex Outdated Show resolved Hide resolved

llvm-beanz added 2 commits December 1, 2023 14:10

Updates based on feedback by @dmpots.

736620f

Thank you!

Fix paragraph counters in subsubsections

92714e3

bharadwajy mentioned this pull request Dec 1, 2023

Add Acronym and Glossary to TOC; fix Glossary reference in Introduction. #130

Merged

llvm-beanz added 2 commits February 22, 2024 09:48

Merge remote-tracking branch 'origin/main' into cbieneman/terminology

539e1e2

Add section about illegal optimizations

d59c109

Keenuts reviewed Feb 22, 2024

View reviewed changes

specs/language/introduction.tex Outdated Show resolved Hide resolved

Add language about required synchronization

a28ac16

damyanp approved these changes Feb 28, 2024

View reviewed changes

specs/language/introduction.tex Show resolved Hide resolved

Add wording about othe reasons lanes may be inactive

51c8b4e

llvm-beanz merged commit 2a8e8da into microsoft:main Apr 9, 2024
3 checks passed

llvm-beanz added a commit to llvm-beanz/hlsl-specs that referenced this pull request Feb 9, 2025

Fleshing out Execution and Memory models (microsoft#94)

9026721

This update Intro.Defs earlier in the introduction and puts more detail into the SPMD programming model and memory model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fleshing out Execution and Memory models #94

Fleshing out Execution and Memory models #94

llvm-beanz commented Sep 7, 2023

jeremyong left a comment

llvm-beanz commented Dec 1, 2023

llvm-beanz commented Feb 22, 2024

Keenuts commented Feb 22, 2024

llvm-beanz commented Feb 22, 2024

Keenuts commented Feb 22, 2024

llvm-beanz commented Feb 23, 2024

Keenuts commented Feb 26, 2024

llvm-beanz commented Feb 26, 2024

llvm-beanz commented Apr 9, 2024

Fleshing out Execution and Memory models #94

Fleshing out Execution and Memory models #94

Conversation

llvm-beanz commented Sep 7, 2023

jeremyong left a comment

Choose a reason for hiding this comment

llvm-beanz commented Dec 1, 2023

llvm-beanz commented Feb 22, 2024

Keenuts commented Feb 22, 2024

llvm-beanz commented Feb 22, 2024

Keenuts commented Feb 22, 2024

llvm-beanz commented Feb 23, 2024

Keenuts commented Feb 26, 2024

llvm-beanz commented Feb 26, 2024

llvm-beanz commented Apr 9, 2024