Skip to content

Conversation

@danielvegamyhre
Copy link
Owner

@danielvegamyhre danielvegamyhre commented Jul 6, 2025

Stacked PRs:


Fix uncoalesced global accesses

Need scales_colwise to be the shape (num_blocks, columns, 1) so the 'column' dim stride is 1, so we can avoid uncoalesced writes to global memory.

This is because each of the 32 threads in a warp will be computing a scale for a different column of 32 input data values, then each writing that scale to global memory, one per column - so the stride along this col dim should be 1 so writes can be coalesced into a single transaction.

NCU before change

Screenshot 2025-07-05 at 6 41 49 PM

NCU after change

Screenshot 2025-07-05 at 6 41 53 PM

stack-info: PR: #11, branch: danielvegamyhre/stack/5
danielvegamyhre added a commit that referenced this pull request Jul 6, 2025
stack-info: PR: #11, branch: danielvegamyhre/stack/5
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/5 branch from 69e763c to b8dbadc Compare July 6, 2025 01:40
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/4 to main July 6, 2025 01:49
danielvegamyhre added a commit that referenced this pull request Jul 6, 2025
stack-info: PR: #11, branch: danielvegamyhre/stack/5
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/5 branch from b8dbadc to fef90a8 Compare July 6, 2025 01:49
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/4 July 6, 2025 01:49
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/4 to main July 6, 2025 03:10
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/5 branch from fef90a8 to 3631dea Compare July 6, 2025 03:10
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/4 July 6, 2025 03:11
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/4 to main July 6, 2025 20:11
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/4 July 6, 2025 20:12
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/4 to main July 6, 2025 21:39
@danielvegamyhre danielvegamyhre changed the base branch from main to danielvegamyhre/stack/4 July 6, 2025 21:39
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/4 branch from 2db90aa to b4c0a32 Compare July 6, 2025 21:41
danielvegamyhre added a commit that referenced this pull request Jul 6, 2025
stack-info: PR: #11, branch: danielvegamyhre/stack/5
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/5 branch from 3631dea to 2039e19 Compare July 6, 2025 21:41
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/4 branch from b4c0a32 to 6b830dc Compare July 6, 2025 21:42
danielvegamyhre added a commit that referenced this pull request Jul 6, 2025
stack-info: PR: #11, branch: danielvegamyhre/stack/5
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/5 branch from 2039e19 to 28a31b6 Compare July 6, 2025 21:42
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/4 branch from 6b830dc to a5f03d8 Compare July 6, 2025 21:43
danielvegamyhre added a commit that referenced this pull request Jul 6, 2025
stack-info: PR: #11, branch: danielvegamyhre/stack/5
@danielvegamyhre danielvegamyhre force-pushed the danielvegamyhre/stack/5 branch 2 times, most recently from 2c4677c to c40ace9 Compare July 6, 2025 21:43
@danielvegamyhre danielvegamyhre changed the base branch from danielvegamyhre/stack/4 to main July 6, 2025 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants