Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation and examples for porting WGSL/GLSL shaders #1096

Open
johnny-smitherson opened this issue Nov 7, 2023 · 2 comments
Open

documentation and examples for porting WGSL/GLSL shaders #1096

johnny-smitherson opened this issue Nov 7, 2023 · 2 comments
Labels
t: enhancement A new feature or improvement to an existing one.

Comments

@johnny-smitherson
Copy link

johnny-smitherson commented Nov 7, 2023

I've been porting some WGSL code like like these into rust and the documentation for actually writing shaders is very sparse: https://embarkstudios.github.io/rust-gpu/book/writing-shader-crates.html

I only managed to get the above program working by reading through the spriv-std test code.

Here are some pain points I have:

  • confusion about the various binding types and how WGSL declaration syntax converts to rust-spirv attribute equivalent for basic types
  • confusion about various barrier types: which of these ones should i use for wgsl storageBarrier() ? what about atomic ops?
  • basic operations, like reading and writing to image textures, image sampling, are not documented nor in examples
  • what Rust types are allowed by default? I found out at runtime u8 are not a thing - not a problem, but should be outlined for the non-initiated
  • is this assembly here the only way to do this thing? What is even this thing that's being done?
  • what Rust language features are known not to work? I discovered the hard way while and loop and/or for don't work - I just can't remember which ones
  • how do i configure capabilities?? - this should be a documentation page
  • how do i debug? - this should be a documentation page

Having a small body of real-life examples, ported from the wgpu wgsl examples for instance, would greatly help outside people actually get into your ecosystem, without really going through the spirv specification and diffing it to the wgsl specification to display a couple of spinning triangles.

I get it that some concepts will not be translateable 1-to-1 but a few more starting examples showing off basic shader features (all shader types, most of binding types, various barrier types, various combinations of globals, uniforms -- these would all help porting WGSL and GLSL into rust

I propose a Before + After rust conversion from each example, so everyone can see how the porting gets done, what gets translated into what. And you can also do benchmarks and correctness testIng on the rust shader vs. the original shader.

Maybe the original WGPU example runner can be used with minimal modification to accept spirv-type shaders alongside the original wgsl, so you won't have to rebuild all the runner bind group code from scratch.

Otherwise, I feel like without being a GPU pipeline engineer, I can't be the target audience for this project, but I really want to be, WGSL is awful, Rust is great.

edit found

Are there any other resources that exist for this?

@johnny-smitherson johnny-smitherson added the t: enhancement A new feature or improvement to an existing one. label Nov 7, 2023
@Cazadorro
Copy link

Cazadorro commented Nov 29, 2023

confusion about the various binding types and how WGSL declaration syntax converts to rust-spirv attribute equivalent for basic types

You probably know this and probably what I'll say subsequently, but virtually all attributes and binding types in WGSL AFAIK are based on the SPIR-V spec. SPIR-V was originally going to be the target language for WebGPU instead of WGSL until Apple's lawyers started getting involved and stopping the process due to some weird legal issue with Khronos Group.

WGSL was put forward in order to still have SPIR-V tools be usable in the web space while satisfying Apple's lawyers. As such, WGSL's documentation in the past made explicit mention of how their own types and attributes map to SPIR-V (or at least used to, they seemed to remove it over time...). This means a lot of the engineers that worked on WGSL kind of know SPIR-V also and implicitly assume SPIR-V knowledge by accident when you read the spec, and thus it's "natural"ish for them to translate between the two in how their concepts map. Note WGSL has changed a lot in recent years as well, so a lot of the issue is on WGSL not being "stable" until recently either.

But anyway I agree, RustGPU needs the kinds of resources mentioned. In the mean time, I'll attempt to answer some of the mappings as best I can here.

#[spirv(storage_buffer, ...)] vs. #[spirv(storage, ...)] vs. Image!

These come from SPIR-V, specifically, these come from the storage class specifier, see the full list here: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_storage_class Make sure to discount the ones that are OpenCL only (kernel mode capability etc...).

From here you can map what belongs where, storage_buffer is equivalent to the storage Address space in WGSL.

While the Image storage class is equivalent to handle, it's almost 1:1. You basically don't deal with storage classes with them directly, (WGSL, samplers and textures are always in handle space, in SPIR-V they are always in "Image" storage class) Opaque handles (samplers and textures) are handled somewhat separately in this way. They are known as "Opaque handles"/"Opaque Pointers" in CUDA and other APIs because they don't live in the same world as a traditional pointer to memory, are typically fixed on execution of a kernel/shader, and you can't increment or decrment/convert to uint with them (they are "opaque", you don't know the address, and you don't know how it's implemented).

The Image! macro exists, because it's covering for the longwinded declaration necessary in SPIR-V, see this https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpTypeImage But it is just syntax sugar

There are potentially 8 different parameters you need to stuff in there. See this for the real spirv_std type that does the same thing.

confusion about various barrier types: which of these ones should i use for wgsl storageBarrier() ?

Barriers in SPIR-V follow memory semantics (barrier(scope, given_semantic)). Barriers in WGSL are strictly less powerful IIRC.

If you look there, you'll see "UniformMemory" is the memory semantic for storage_buffer. The problem is for what ever reason spirv-std doesn't give you one of those as a single function with out parameters.

see https://github.com/EmbarkStudios/rust-gpu/blob/main/crates/spirv-std/src/arch/barrier.rs for barriers availible.

I'm not sure why they don't do this, they may want you to use the more general barriers in which case you may chose to use a memory barrier appropriate to your use case, in this case WGSL defines control barriers as per workgroup, so use the workgroup_memory_barrier(). Otherwise you can do the following to get the exact meaning for the storage buffer barrier (note according to WGSL, it also uses acquireRelease memory semantics )

    spirv_std::arch::memory_barrier<
            spirv_std::memory::Scope::Workgroup as u32, 
            spirv_std::memory::Semantics::UNIFORM_MEMORY.bits() | spirv_std::memory::Semantics::ACQUIRE_RELEASE.bits()
    >(); 

what about atomic ops

atomics ops work much the same way in SPIR-V as barriers, but that also makes them much different from WGSL. If you want to use atomics, you'll want to use one of these implementations depending on your use case. Note that atomic operations in the real world matter at the subgroup level as well, which WGSL doesn't give you access to.

https://github.com/EmbarkStudios/rust-gpu/blob/main/crates/spirv-std/src/arch/atomics.rs Scope and semantics are the same type of thing as before, except now you're talking about "None( relaxed)", "Acquire", "Release", and "AcquireRelease", just like rust and c++, but they aren't attached to the type in these functions. WGSL attaches the scope to atomics, and apparently always uses relaxed (presumably because of mobile GPUs poor memory models).

there for adding to the count variable in your example might look like:

 spirv_std::arch::atomic_i_add<
    u32,  
    spirv_std::memory::Scope::Workgroup as u32, 
    spirv_std::memory::Semantics::None as u32
>( &mut count, 1u32); 

basic operations, like reading and writing to image textures, image sampling, are not documented nor in examples

yep, really weird they don't show this, I don't think there's a single sampled image in their examples for shaders. The easiest way to find out is to look up, say, "sample" in spirv-std https://docs.rs/spirv-std/latest/spirv_std/index.html and then look at the spir-v docs, and extrapolate from there. there's also examples from other people scattered outside of this repository ie strolle or embarks own Kajiya specifically here: https://github.com/EmbarkStudios/kajiya/tree/main/crates/lib/rust-shaders/src

For example, to use a sampled image, you'd take a texture and a sampler , and one of the associated sampling functions and do something like:


#[spirv(vertex)]
pub fn foo_vs(
//inputs 
 vertex_attribute_0: Vec4,
...

//outputs 
fragment_uv: &mut Vec2,
...
){
}
#[spirv(fragment)]
fn foo_fs(
#[spirv(descriptor_set = 0, binding = 0)] texture: &Image!(2D, type=f32, sampled=true),
#[spirv(descriptor_set = 0, binding = 1)] sampler: &Sampler),
//inputs
 fragment_uv: Vec2,
//outputs
out_color: &mut Vec4
){

   let sampled_color : Vec4 = texture.sample(sampler, fragment_uv); 
  *out_color = sampled_color; 
}

what Rust types are allowed by default? I found out at runtime u8 are not a thing - not a problem, but should be outlined for the non-initiated

Yeah, not sure what the deal is but this is a problem. You have to search for open issues on what is implemented right now, for example, int128 doesn't exist. Though TBH, I'm not sure why u8 is not implemented since it is a thing in SPIR-V, and would look nearly identical to the code for i32,i64 etc... Strangely when I look through the codebase, it would appear that it is implemented, and I can see other projects using it:

https://github.com/EmbarkStudios/kajiya/blob/d373f76b8a2bff2023c8f92b911731f8eb49c6a9/crates/lib/rust-shaders-shared/src/ssgi.rs#L8

https://github.com/Patryk27/strolle/blob/92b042e1c95638c7200ac4b7e894ee0664320ef4/strolle-shaders/reference-shading/src/lib.rs#L57

If you want, you can go implement this yourself with https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpTypeInt but this seems like another bug if you can't use that type.

is this assembly here the only way to do this thing? What is even this thing that's being done?

It's possible the problem here is that you're coming from WGSL, and unless you're also a GPGPU programmer or have worked extensively with modern vulkan, you may have never encountered "subgroups".

You probably know a lot or all of this, but it's worth repeating anyway. The GPU doesn't actually execute "threads" in the traditional sense. A gpu is more or less a collection of SIMD units masquerading as individual threads for each lane for the programmer. On the gpu every set "n" threads is a "subgroup" (usually a power of 2, on Nvidia 32, on Amd, 32 or 64, on Intel sometimes 128, sometimes 16, other GPUS have different subgroup sizes) Because the GPU is organized this way, there are many consequences, mainly:

  • Branching within a subgroup that is not optimized out will result in something called "thread divergence"
    • because you're actually executing on an SIMD unit, in order for instructions to execute at the same time, they must all be the same.
    • When you have "thread divergence", the instruction pointer must change for each branch, and thus each branch executes independently, in serial, ie one after another.
  • Branching on the subgroup boundary (ie first 32 take one path, and next 32 take another path on Nvidia) Will not result in "thread divergence"
  • each hardware subgroup "knows" about all other threads in the subgroup, and thus may use cooperation instructions to work with threads within a subgroup for significant speed up versus other methods (such as reaching back to shared memory).

Note that subgroup is a "generalized" cross-platform term for this concept, in the past they have been referred to as "Wavefronts" by AMD and pre-gpu parrallel processing literature, and are called "Warps" in Nvidia CUDA nomenclature.

You can see all the types of subgroup operations available explained here (it's GLSL but it maps to SPIR-V and thus maps to RustGPU)

https://www.khronos.org/blog/vulkan-subgroup-tutorial

and the corresponding SPIR-V instructions here

https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#_group_operation

in particular, this function is trying to emulate the following function described in the vulkan subgroup tutorial

T subgroupAdd(T value) returns the summation of all active invocations value's across the subgroup.

What this is saying, is that if I have, say, a subgroup size of 32, and I make the following call:

let sum = subgroup_add( 1u32); 

That will result in sum having a value of 32u32 and if I instead call subgroup_add( 10u32) it will be 320u32, and if I use another variable, say a value from the array of 32 values

let sum = subgroup_add( my_array[my_subgroup_idx]); 

it will be the sum of all 32 values. And not only will this subgroup thread get access to that value, that value is broadcast to every thread in the subgroup.

what Rust language features are known not to work? I discovered the hard way #1076 (comment) - I just can't remember which ones

Lots of bugs with loops and optimization, I'm also frustrated that this has not been coalesced searchably into one document.

#1094 - this should be a documentation page

It looks like the answer is already there, but I suspect the reason that kind of stuff isn't a priority is because kernel and shader mode spirv are not compatible, and Rust-GPU was developed for shader/vulkan SPIR-V so I'm not even sure basic functionality will work if you try to, say sample a texture in kernel mode.

Hopefully this helps answer the questions put forth here and anyone else who wanders here, and highlights the need for better documentation for rust-gpu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t: enhancement A new feature or improvement to an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants
@Cazadorro @johnny-smitherson and others