Skip to content

Conversation

@johnnynunez
Copy link

Fix CUDA barrier init crash when num_consumers < NumThreadsPerWarpGroup

Previously, integer division caused num_consumer_warpgroups_per_cluster to be 0
when params.num_consumers (e.g., 32) was less than NumThreadsPerWarpGroup (128),
leading to a compiler failure during barrier initialization. Changed to round-up
division to ensure a minimum value of 1.

@johnnynunez johnnynunez closed this by deleting the head repository Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant