make subgroup ops match glsl more closely #32

Firestar99 · 2024-10-12T11:32:51Z

I wasn't a big fan of how I exposed subgroup ops that require a value from the GroupOperation enum. This PR changes all of those functions to be much more like glsl exposes them.

These operations:

spirv_std::arch::subgroup_i_add::<{ GroupOperation::Reduce as u32 }, _>(value)
spirv_std::arch::subgroup_i_add::<{ GroupOperation::InclusiveScan as u32 }, _>(value)
spirv_std::arch::subgroup_i_add::<{ GroupOperation::ExclusiveScan as u32 }, _>(value)
spirv_std::arch::subgroup_i_add_clustered::<8, _>(value)

Got changed to this:

spirv_std::arch::subgroup_i_add(value)
spirv_std::arch::subgroup_inclusive_i_add(value)
spirv_std::arch::subgroup_exclusive_i_add(value)
spirv_std::arch::subgroup_clustered_i_add::<8, _>(value)

The new function names are almost exactly the same as glsl, except that they have the extra i suffix in their name to say that they operate on integers (or f floats, u unsigned, s signed). If anyone got an idea how to unify that, please let me know :D
(Note that the asm instruction must be a string literal, so eg. you can't use trait constants to implement it)

LegNeato · 2024-10-15T17:16:53Z

Is there a way with traits we can flip it? Like adding extension traits:

    use spirv_std::arch::SubgroupOperations as _;

    let float_result = float_val.subgroup_add();
    let float_inclusive = float_val.inclusive_subgroup_add();
    let float_exclusive = float_val.exclusive_subgroup_add();

or something like:

    use spirv_std::arch::ops::SubgroupAdd;
    
    let float_result = <f32 as SubgroupAdd>::add(float_val);
    let float_inclusive = <f32 as SubgroupAdd>::inclusive_add(float_val);
    let float_exclusive = <f32 as SubgroupAdd>::exclusive_add(float_val);

Not sure if those are more ergonomic, but I think they can be implemented?

Firestar99 · 2024-10-16T11:42:49Z

I'll be honest, I'm not a fan of it ergonomic-wise. Also current atomics are named simiarly, eg. atomic_i_add, though they are i_add whereas this PR uses add_i.

Here's also a small example of shader code using the renamed subgroup_ballot_exclusive_bit_count(), unfortunately a ballot op as I have not yet used an arithmetic op anywhere. It's atomically allocating space in a buffer (atomic_counter) to write elements in a compacted manner, so the resulting buffer has one element after the next without some holes in between.

pub fn subgroup_write_non_uniform(&mut self, t: T) {
	let index = unsafe {
		let ballot = subgroup_ballot(true);
		let count = subgroup_ballot_bit_count(ballot);
		let base_index = if subgroup_elect() {
			atomic_i_add::<_, { Scope::Device as u32 }, { Semantics::NONE.bits() }>(self.atomic_counter, count)
		} else {
			0
		};
		let base_index = subgroup_broadcast_first(base_index);
		let inv_index = subgroup_ballot_exclusive_bit_count(ballot);
		base_index + inv_index
	};
	// self.buffer[index] = T;
}

LegNeato · 2024-10-16T12:44:48Z

I agree that is probably better, but bare methods still don't feel very idiomatic to me coming from rust with very little graphics background.

Can we do something like a Subgroup struct you can initialize and then call functions on it? I'm thinking how Mutex gives you a guard or how threads can be scoped or not scoped and then joined or how tokio gives you a handle you can later poke at. Subgroups are a stateful "thing", so it feels weird to me to call bare functions to manipulate the thing rather than having the handle to the thing.

Firestar99 · 2024-10-16T12:55:42Z

Subroups are a stateful "thing"

Subgroup intrinsics are not stateful, they are just plain functions that do special stuff. And in theory come with a control barrier across the subgroup, though in practice that barrier just disappears cause invocations within a subgroup should always progress at the same rate (at least on AMD and most other archs, not sure what the new Nvidia cards do).

The only state there is is SubgroupMask used in subgroup_ballot operations, where each bit represents an invocation in the subgroup (the ballot variable in the example). And that's just a fancy wrapper of an UVec4 I added for convenience, as the spec only speaks of UVec4.

#[derive(Copy, Clone, Default, Eq, PartialEq)]
pub struct SubgroupMask(pub glam::UVec4);

LegNeato · 2024-10-16T13:01:16Z

Subgroups are stateful during a single shader invocation but lose this state once the invocation ends. So for the code running they are stateful, which is what I meant.

LegNeato · 2024-10-16T13:52:27Z

If we have stateful APIs and must_use we can eliminate some correctness / UB issues and footguns:

"Subgroup operations depend on the correct state of invocations... incorrect mask values can result in undefined behavior or synchronization failure"

"Ignoring the result of subgroup operations can lead to invalid or non-deterministic outcomes, especially in communication primitives like broadcast."

Firestar99 · 2024-10-23T13:11:19Z

I pop'ed off the last commit to make the subgroup functions match the atomic functions more closely (subgroup_inclusive_add_i -> subgroup_inclusive_i_add).

I don't think it's worth while spending a lot of time on perfecting this API, especially since not a lot of people will be using this extension. I'd rather prioritize getting a nice atomics API going and then adjusting subgroups to match that. And even then I think we got more important things to take care of first.

I'd like to get this PR merged in the state it is. But if you have any concrete ideas for a better API, that also doesn't alienate those familiar with the glsl API from the little information out there, feel free to make a new PR.

LegNeato

I think we can merge this as-is and do some more design in the future.

Firestar99 force-pushed the subgroup3 branch 2 times, most recently from 418e290 to 64e2c79 Compare October 12, 2024 12:28

Firestar99 marked this pull request as ready for review October 13, 2024 09:33

Firestar99 requested review from eddyb, LegNeato and fornwall as code owners October 13, 2024 09:33

Firestar99 changed the title ~~subgroup ops refactor, matches glsl more closely~~ make subgroup ops match glsl more closely Oct 13, 2024

Firestar99 added 3 commits October 14, 2024 13:19

subgroup ops refactor, matches glsl more closely

f1480b4

subgroup: don't special case ClusteredReduce anymore

9584adc

subgroup: allow missing safety doc in subgroups

c427766

Firestar99 force-pushed the subgroup3 branch from 64e2c79 to 88fe13e Compare October 14, 2024 11:22

Firestar99 force-pushed the subgroup3 branch from 88fe13e to c427766 Compare October 23, 2024 13:03

LegNeato approved these changes Nov 7, 2024

View reviewed changes

LegNeato added this pull request to the merge queue Nov 7, 2024

Merged via the queue into Rust-GPU:main with commit ed32960 Nov 7, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make subgroup ops match glsl more closely #32

make subgroup ops match glsl more closely #32

Firestar99 commented Oct 12, 2024 •

edited

Loading

LegNeato commented Oct 15, 2024 •

edited

Loading

Firestar99 commented Oct 16, 2024 •

edited

Loading

LegNeato commented Oct 16, 2024 •

edited

Loading

Firestar99 commented Oct 16, 2024

LegNeato commented Oct 16, 2024

LegNeato commented Oct 16, 2024

Firestar99 commented Oct 23, 2024

LegNeato left a comment

make subgroup ops match glsl more closely #32

make subgroup ops match glsl more closely #32

Conversation

Firestar99 commented Oct 12, 2024 • edited Loading

LegNeato commented Oct 15, 2024 • edited Loading

Firestar99 commented Oct 16, 2024 • edited Loading

LegNeato commented Oct 16, 2024 • edited Loading

Firestar99 commented Oct 16, 2024

LegNeato commented Oct 16, 2024

LegNeato commented Oct 16, 2024

Firestar99 commented Oct 23, 2024

LegNeato left a comment

Choose a reason for hiding this comment

Firestar99 commented Oct 12, 2024 •

edited

Loading

LegNeato commented Oct 15, 2024 •

edited

Loading

Firestar99 commented Oct 16, 2024 •

edited

Loading

LegNeato commented Oct 16, 2024 •

edited

Loading