fix: reset the cross product w component to 0 in neon #213

yono-main · 2024-03-12T09:40:35Z

Would it be better to set the w component to 0 before returning?
The return values on the ARM platform now differ from what they were before.

CLAassistant · 2024-03-12T09:40:41Z

All committers have signed the CLA.

nfrechette · 2024-03-12T13:20:46Z

Hello and thank you for your interest in RTM!

In RTM, when a container is wider than what it contains, extra SIMD lanes are ignored. For example, a 3x4 matrix is composed of 4x vector4 rows where the last SIMD lane is left undefined as it is implicitly [0, 0, 0, 1] (as a column). Similarly, for 3D vector function (like dot product, cross product, etc), as inputs, the unused W lane is ignored and the output W lane is undefined. This should be consistent across the library unless explicitly specified in the documentation (function header comment). This is an executive decision that I made as the author. While it would be entirely valid for all undefined output lanes to be explicitly set to zero as you have here, I have made the decision to leave the value undefined in order to ensure that no unnecessary work is being performed. Usually, in 3D math where the 4th lane isn't needed, its undefined value will never be particularly relevant. Only in edge cases where perhaps you need to turn it into a mask of some sort might you need a specific value there. In practice, I have seen the unused 4th lane actually used for various things and it isn't uncommon to re-purpose it to something else. In such scenarios, explicitly setting the last lane to any known value would end up being redundant work.

Note that the SSE2 version of the cross product makes no guarantee about the returned W lane as well and there you would likely need 2 instructions in order to set W to zero in a function that otherwise takes 6 instructions: setting the W to zero would thus have a 25% overhead (2 out of 8 instructions, although they would be very cheap ones). Even if these instructions are very cheap (1 cycle or less, each), they tend to add up. This may not have a measurable impact on performance in some cases, but in others it can be quite dramatic as the extra instructions can cause a larger calling function to fail inlining (e.g. whoever calls cross3). Compilers generally use the number of instructions/registers/stack space usage as heuristics to determine when to inline things, not the cycle cost of the instructions. Inlining is perhaps the biggest performance win possible in hot math code as it enables further optimizations. RTM does its best to pass as many things by register when it can, but it isn't always possible as that is dictated by the calling convention.

For those reasons, leftover lanes in outputs are left explicitly undefined in order to have functions that don't need them be as lean as possible. I do try and make an effort to set them to zero when it is free to do so (from an instruction/cycle perspective) but that is not guaranteed by the API and is no generally applicable.

If you have unit tests that rely on the leftover lane, I suggest to change them to use the 3D version of the vector comparison/testing functions as those will ignore the 4th lane.

Note that not a lot of functions support 2D vectors (yet) but the same rules will apply there and the ZW lanes will be undefined/unused.

Cheers,
Nicholas

yono-main · 2024-03-12T13:44:53Z

Hi Nicholas~

I previously used the returned vector to construct a 4x4 matrix. According to your description, I believe I should specify the leftover lanes outside of this cross function.

Thank you for your detailed explanation, and also the excellent RTM library.

fix: reset the cross product w component to 0 in neon

3d5df94

yono-main closed this Mar 12, 2024

yono-main mentioned this pull request Mar 12, 2024

feat: optimize vector_cross3 for ARM NEON #210

Merged

yono-main reopened this Mar 12, 2024

nfrechette closed this Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reset the cross product w component to 0 in neon #213

fix: reset the cross product w component to 0 in neon #213

yono-main commented Mar 12, 2024 •

edited

Loading

CLAassistant commented Mar 12, 2024 •

edited

Loading

nfrechette commented Mar 12, 2024

yono-main commented Mar 12, 2024

fix: reset the cross product w component to 0 in neon #213

fix: reset the cross product w component to 0 in neon #213

Conversation

yono-main commented Mar 12, 2024 • edited Loading

CLAassistant commented Mar 12, 2024 • edited Loading

nfrechette commented Mar 12, 2024

yono-main commented Mar 12, 2024

yono-main commented Mar 12, 2024 •

edited

Loading

CLAassistant commented Mar 12, 2024 •

edited

Loading