-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it still make sense to align structs for AVX ? #1243
Comments
The struct is not packed, so its size is 20, not 18. Having said all that, I don't know much about AVX, so I'm not sure how aligning to 32/64 bytes would help. |
float is 4 bytes. |
I mean, we can store deltas in some other memory buffer to allow faster aligned AVX operations on the quants. I'm not an expert in AVX too, so maybe if there pattern of sequential access (like matmul pipeline getting one struct after another), there no cache penalty at all for modern CPUs. There definitely was a penalty for older CPUs, thus different AVX instructions for aligned / unaligned vectors. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
It seems, Q4 / Q8 weights do not aligned within memory bounds / cache line size.
Like here 4bytes + 16bytes:
I used to think that it better to align for 32/64 bytes for faster AVX2 / AVX512 (and there special ops to work with aligned vectors).
So not sure maybe modern CPUs do handle mis-aligned data easily or maybe we loose some performance here?
The text was updated successfully, but these errors were encountered: