-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for complex arithmetics #2047
Comments
We are happy to maintain contributed functions. Assuming only SVE supports these instructions natively, it is actually pretty easy to implement a fallback for other platforms because it can be done just once, without repeating for each platform, by putting it in generic_ops-inl.h. One general principle is that we want the code to be reasonably efficient on all platforms. I wonder whether it would be better, if we did not have the SVE instructions, to organize complex numbers into two regs re and im, rather than in odd/even lanes of one vector? Let's imagine an app willing to have a special case for SVE, and a second codepath for other platforms. Would this be faster than if we always used odd/even layout for Z numbers? If so, it sounds like an |
I see your point, we indeed found that de-interleaving the complex numbers first was faster for highway on NEON & SVE. I'm not sure about the x86 side of things though. Even if this is the case, it would be nice to be able to access the SVE instructions from highway since they seem to perform significantly better. Either way, needs further investigation on x86 it sounds like |
F32 The F16/F32/F64 AddSub op should be re-implemented using svcadd on SVE targets as svcadd is more efficient than the default AddSub implementation in generic_ops-inl.h on SVE targets. F16/F32/F64 SVE SVE SVE SVE |
Thanks @johnplatts for pointing out that we can already target svcadd with existing (Mul)AddSub. |
I have re-implemented AddSub and MulAddSub on SVE using svcadd in pull request #2054. |
It's good to know that svcadd is already being used in highway! |
hm. It seems that the CMLA instruction is 'exotic' in the sense that other ISAs do not provide such an instruction. Do you have any suggestion on how we could handle that without performance cliffs in one ISA? |
Here is a link to a generic implementation of the ComplexAddRot90/270 ops (equivalent to SVE There are also The generic implementation of the ComplexAdd/ComplexMulAdd ops linked above is efficient on most SIMD targets, including SSSE3/SSE4/AVX2/AVX3/NEON. SSSE3/SSE4/AVX2/AVX3 have AddSub instructions for F32/F64 vectors that are 32 bytes or smaller that helps improve the performance of the ComplexAdd/ComplexMulAdd ops. |
Thanks, those implementations look good to me! Are we proposing to add those as new ops, with single-instruction implementations for SVE? That seems fine provided we are confident that apps would want to use those ops as defined. One remaining concern I have (because not familiar with complex arithmetic): are there perhaps other equivalent ways of implementing the desired formulas, that would be more efficient than these generic implementations when run on non-SVE? |
Hi,
I would like to propose the addition of complex arithmetic instructions to highway. This would allow us to take advantage of the SVE complex arithmetic instructions (svcadd, svcmla and svcdot), improving the performance of complex arithmetics on arm. I imagine the difficulty would be the need to implement and maintain equivalent functions for x86 and NEON where these instructions do not exist natively.
The text was updated successfully, but these errors were encountered: