You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
🚀 Feature
Check GridDim in -fblockfusion_level=2 to satisfy the active block limitation in CUDA.
Motivation
BlockFusion with -fblockfusion_level=2 uses inter-block synchronization primitives. Improper number of BEs (vEUs) may lead to deadlock due to the active block limitation in CUDA.
Pitch
We can use nvcc to check the GridDim after blockfusion codegen and adaptively change the number of BEs (vEUs) to satisfy the active block limitation in CUDA.
Alternatives
Fallback to -fblockfusion_level=1 when the GridDim exceeds the active block limitation. The overhead of inter-block synchronization is becoming larger with the increasing of blocks.
Additional context
The text was updated successfully, but these errors were encountered:
🚀 Feature
Check GridDim in -fblockfusion_level=2 to satisfy the active block limitation in CUDA.
Motivation
BlockFusion with -fblockfusion_level=2 uses inter-block synchronization primitives. Improper number of BEs (vEUs) may lead to deadlock due to the active block limitation in CUDA.
Pitch
We can use nvcc to check the GridDim after blockfusion codegen and adaptively change the number of BEs (vEUs) to satisfy the active block limitation in CUDA.
Alternatives
Fallback to -fblockfusion_level=1 when the GridDim exceeds the active block limitation. The overhead of inter-block synchronization is becoming larger with the increasing of blocks.
Additional context
The text was updated successfully, but these errors were encountered: