Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make subgroup ops match glsl more closely #32

Merged
merged 3 commits into from
Nov 7, 2024

Conversation

Firestar99
Copy link
Member

@Firestar99 Firestar99 commented Oct 12, 2024

I wasn't a big fan of how I exposed subgroup ops that require a value from the GroupOperation enum. This PR changes all of those functions to be much more like glsl exposes them.

These operations:

spirv_std::arch::subgroup_i_add::<{ GroupOperation::Reduce as u32 }, _>(value)
spirv_std::arch::subgroup_i_add::<{ GroupOperation::InclusiveScan as u32 }, _>(value)
spirv_std::arch::subgroup_i_add::<{ GroupOperation::ExclusiveScan as u32 }, _>(value)
spirv_std::arch::subgroup_i_add_clustered::<8, _>(value)

Got changed to this:

spirv_std::arch::subgroup_i_add(value)
spirv_std::arch::subgroup_inclusive_i_add(value)
spirv_std::arch::subgroup_exclusive_i_add(value)
spirv_std::arch::subgroup_clustered_i_add::<8, _>(value)

The new function names are almost exactly the same as glsl, except that they have the extra i suffix in their name to say that they operate on integers (or f floats, u unsigned, s signed). If anyone got an idea how to unify that, please let me know :D
(Note that the asm instruction must be a string literal, so eg. you can't use trait constants to implement it)

@Firestar99 Firestar99 force-pushed the subgroup3 branch 2 times, most recently from 418e290 to 64e2c79 Compare October 12, 2024 12:28
@Firestar99 Firestar99 marked this pull request as ready for review October 13, 2024 09:33
@Firestar99 Firestar99 changed the title subgroup ops refactor, matches glsl more closely make subgroup ops match glsl more closely Oct 13, 2024
@LegNeato
Copy link
Collaborator

LegNeato commented Oct 15, 2024

Is there a way with traits we can flip it? Like adding extension traits:

    use spirv_std::arch::SubgroupOperations as _;

    let float_result = float_val.subgroup_add();
    let float_inclusive = float_val.inclusive_subgroup_add();
    let float_exclusive = float_val.exclusive_subgroup_add();

or something like:

    use spirv_std::arch::ops::SubgroupAdd;
    
    let float_result = <f32 as SubgroupAdd>::add(float_val);
    let float_inclusive = <f32 as SubgroupAdd>::inclusive_add(float_val);
    let float_exclusive = <f32 as SubgroupAdd>::exclusive_add(float_val);

Not sure if those are more ergonomic, but I think they can be implemented?

@Firestar99
Copy link
Member Author

Firestar99 commented Oct 16, 2024

I'll be honest, I'm not a fan of it ergonomic-wise. Also current atomics are named simiarly, eg. atomic_i_add, though they are i_add whereas this PR uses add_i.

Here's also a small example of shader code using the renamed subgroup_ballot_exclusive_bit_count(), unfortunately a ballot op as I have not yet used an arithmetic op anywhere. It's atomically allocating space in a buffer (atomic_counter) to write elements in a compacted manner, so the resulting buffer has one element after the next without some holes in between.

pub fn subgroup_write_non_uniform(&mut self, t: T) {
	let index = unsafe {
		let ballot = subgroup_ballot(true);
		let count = subgroup_ballot_bit_count(ballot);
		let base_index = if subgroup_elect() {
			atomic_i_add::<_, { Scope::Device as u32 }, { Semantics::NONE.bits() }>(self.atomic_counter, count)
		} else {
			0
		};
		let base_index = subgroup_broadcast_first(base_index);
		let inv_index = subgroup_ballot_exclusive_bit_count(ballot);
		base_index + inv_index
	};
	// self.buffer[index] = T;
}

@LegNeato
Copy link
Collaborator

LegNeato commented Oct 16, 2024

I agree that is probably better, but bare methods still don't feel very idiomatic to me coming from rust with very little graphics background.

Can we do something like a Subgroup struct you can initialize and then call functions on it? I'm thinking how Mutex gives you a guard or how threads can be scoped or not scoped and then joined or how tokio gives you a handle you can later poke at. Subgroups are a stateful "thing", so it feels weird to me to call bare functions to manipulate the thing rather than having the handle to the thing.

@Firestar99
Copy link
Member Author

Subroups are a stateful "thing"

Subgroup intrinsics are not stateful, they are just plain functions that do special stuff. And in theory come with a control barrier across the subgroup, though in practice that barrier just disappears cause invocations within a subgroup should always progress at the same rate (at least on AMD and most other archs, not sure what the new Nvidia cards do).

The only state there is is SubgroupMask used in subgroup_ballot operations, where each bit represents an invocation in the subgroup (the ballot variable in the example). And that's just a fancy wrapper of an UVec4 I added for convenience, as the spec only speaks of UVec4.

#[derive(Copy, Clone, Default, Eq, PartialEq)]
pub struct SubgroupMask(pub glam::UVec4);

@LegNeato
Copy link
Collaborator

Subgroups are stateful during a single shader invocation but lose this state once the invocation ends. So for the code running they are stateful, which is what I meant.

@LegNeato
Copy link
Collaborator

If we have stateful APIs and must_use we can eliminate some correctness / UB issues and footguns:

"Subgroup operations depend on the correct state of invocations... incorrect mask values can result in undefined behavior or synchronization failure"

"Ignoring the result of subgroup operations can lead to invalid or non-deterministic outcomes, especially in communication primitives like broadcast."

@Firestar99
Copy link
Member Author

I pop'ed off the last commit to make the subgroup functions match the atomic functions more closely (subgroup_inclusive_add_i -> subgroup_inclusive_i_add).

I don't think it's worth while spending a lot of time on perfecting this API, especially since not a lot of people will be using this extension. I'd rather prioritize getting a nice atomics API going and then adjusting subgroups to match that. And even then I think we got more important things to take care of first.

I'd like to get this PR merged in the state it is. But if you have any concrete ideas for a better API, that also doesn't alienate those familiar with the glsl API from the little information out there, feel free to make a new PR.

Copy link
Collaborator

@LegNeato LegNeato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this as-is and do some more design in the future.

@LegNeato LegNeato added this pull request to the merge queue Nov 7, 2024
Merged via the queue into Rust-GPU:main with commit ed32960 Nov 7, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants