-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add padding to reduce shared memory bank conflicts #193
Conversation
This PR adds padding to shared memory allocations to minimize bank conflicts. Signed-off-by: Harsh Menon <harsh@nod-labs.com>
would involve swizzling of the shared memory access patterns. | ||
""" | ||
padding = 64 // dtype.bitwidth() | ||
return tuple( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it's already aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this approach, we just apply a blanket padding value that is independent of whether the shape is aligned or not. We could also making this a tuning parameter to see which value gives the best performance for a given shape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
""" | ||
padding = 64 // dtype.bitwidth() | ||
return tuple( | ||
value + padding if i == len(shape) - 1 else value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something like shape[-1] += padding
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shape is a tuple so unfortunately not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great just couple minor NITs/Qs.
This PR adds padding to shared memory allocations
to minimize bank conflicts.