Skip to content

Best practice for small-vector linear algebra? #8281

Answered by abadams
allemangD asked this question in Q&A
Discussion options

You must be logged in to vote

Here's my usual approach: https://github.com/halide/Halide/blob/main/apps/bgu/bgu_generator.cpp

In the solves there are serial dependencies between matrix elements, so it's pointless to try to actually use SIMD to represent columns of the matrix or something (plus only 4 elements is way too small for simd on x86). I just vectorize across a different axis - one that's truly data-parallel.

I think there might be 4x4 matrix acceleration on some mobile GPUs, but I don't think that has been a thing on desktop GPUs for a while now. cuda supports low-precision 16x16 matrix multiplies on the tensor cores, but that's probably not good enough for the classic graphics 4x4 homogeneous transform matri…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@allemangD
Comment options

Answer selected by allemangD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants