Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New subtraction family of vector functions #364

Merged
merged 7 commits into from
Dec 10, 2023
Merged

Conversation

BeeverFeever
Copy link
Contributor

Addresses: #363

I believe that this should be most of what is required for these functions to be implemented. I don't think I have missed anything but please take a close look at the vec4 functions to ensure they are correct as I wasn't entirely certain about what I was doing.

I wrote tests but when run the new functions do not show up. I don't know if I am just missing something or if I wrote the tests wrong. I am also unable to test on multiple platforms, I only have access to linux. However, I don't believe these changes are major enough to break anything platform specific.

I wasn't able to generate the documentation for the new functions. I am unfamiliar with sphinx and when I runt the commands from the README, I get an error:

Theme error:
no theme named 'sphinx_rtd_theme' found (missing theme.conf?)

Please inform me of anything that requires changing or adjustments, especially the tests and docs on how to get them working correctly.

@gottfriedleibniz
Copy link

gottfriedleibniz commented Dec 4, 2023

While glancing over this, noticed a few things:

  1. The glm_vec4_addsub documentation and/or name do not match its implementation (reference glm_vec3_addsub).
  2. glm_vec4_mulsub's glmm_fmsub use may require a second look, i.e., glmm_fnmadd.
  3. The assert statement in TEST_IMPL(GLM_PREFIX, vec4_subsub) may require some parenthesis (a - (b - c)).

@BeeverFeever
Copy link
Contributor Author

Ok cool, I will address 1 and 3 later today. As for 2, I'm not very familiar with simd so could you expand on this a little bit please?

@gottfriedleibniz
Copy link

glm_vec4_mulsub is documented as: dest -= a * b. Referencing these functions (without __FMA__):

  1. glmm_fmsub is defined as: _mm_sub_ps(_mm_mul_ps(a, b), c): a*b - c
  2. glmm_fnmadd is defined as: _mm_sub_ps(c, _mm_mul_ps(a, b)): c - a*b

@BeeverFeever
Copy link
Contributor Author

Ahh ok i see. Thanks for the explanation. I will change that later today as well.

@BeeverFeever
Copy link
Contributor Author

I think I fixed all the things you mentioned. I'm still unable to get the tests to run for the new functions though.

@recp
Copy link
Owner

recp commented Dec 6, 2023

I'm still unable to get the tests to run for the new functions though.

You must add the tests declared with TEST_DECLARE() to TEST_LIST { like

TEST_LIST {
  TEST_ENTRY(glm_vec3_addadd)
  TEST_ENTRY(glm_vec3_subadd)
  TEST_ENTRY(glm_vec3_muladd)

maybe we should make the process of defining tests more easy later. For instance TEST_IMPL could add itself to test-list dynamically.


Comment top of the file :)

/*
 * To register a test:
 *   1. use TEST_DECLARE() to forward declare test
 *   2. use TEST_ENTRY() to add test to list
 */

@BeeverFeever
Copy link
Contributor Author

Yeah making it so you only have to edit one place for the tests would make it easier. However, I am just dumb and blind and didnt read comments so thats my fault sorry.

I will fix that up later. Everything else should be good to be merged though right?

@recp
Copy link
Owner

recp commented Dec 7, 2023

@BeeverFeever thanks yes it looks ok to me at first glance 👍

@BeeverFeever
Copy link
Contributor Author

So I got the tests working however the vec4 mulsub and mulsubs fail and I can not figure out what is wrong with them.

After v1, v2 and v4 are run through the mulsub function (in the tests), v4[0] = 5 when it should be 7. I've looked at it but the code is exactly the same as all the other mulsub functions and I don't know what is wrong. I hope I'm just missing something simple but I don't know what else to do.

@recp
Copy link
Owner

recp commented Dec 10, 2023

@BeeverFeever as @gottfriedleibniz suggested glmm_fnmadd does c - (a * b) which is correct operation for mulsub but you used glmm_fnmsub which does -c -(a * b).

I also fixed glmm_fmsub(): 8a1d1cf and will check other sub operations too asap

@gottfriedleibniz
Copy link

gottfriedleibniz commented Dec 10, 2023

I also fixed glmm_fmsub

Both implementations still look incorrect (we are looking for (a * b) - c):

  1. vfmsq_f32(c, a, b) -> -(a * b) + c.
  2. vmlsq_f32(vnegq_f32(c), a, b) -> -(a * b) - c

@BeeverFeever
Copy link
Contributor Author

@recp Oh interesting, I forgot to change those around after @gottfriedleibniz's comment. After changing those the tests pass. I didn't realise simd was being tested in those tests. Anyway, everything should be good now.

@recp
Copy link
Owner

recp commented Dec 10, 2023

@gottfriedleibniz just tried to quick fix but seems wrong, what about this?:

static inline
float32x4_t
glmm_fmsub(float32x4_t a, float32x4_t b, float32x4_t c) {
#if CGLM_ARM64
  return vfmsq_f32(c, vnegq_f32(a), b);
#else
  return vmlsq_f32(c, vnegq_f32(a), b);
#endif
}

it should produce a * b -c now

@recp
Copy link
Owner

recp commented Dec 10, 2023

@BeeverFeever can you add test (TEST_ENTRY) and push latest changes?

Also It would be nice to register glmc_ versions in TESTs you can see similar tests. glmc_ is just wrapper but sometimes there can be copy-paste errors or calling wrong inline functions...

@BeeverFeever
Copy link
Contributor Author

@recp Just did.

@gottfriedleibniz
Copy link

Another alternative could be: vfmaq_f32/vmlaq_f32 with vnegq_f32(c)

@recp
Copy link
Owner

recp commented Dec 10, 2023

@gottfriedleibniz just did like:

static inline
float32x4_t
glmm_fmsub(float32x4_t a, float32x4_t b, float32x4_t c) {
  return glmm_fmadd(vnegq_f32(c), a, b);
}

thanks

@recp recp merged commit 4f88a02 into recp:master Dec 10, 2023
@recp
Copy link
Owner

recp commented Dec 10, 2023

@BeeverFeever the PR is merged thanks for your contributions 🚀

Also any chance to get these updates for existing ivec[2|3|4] too? Otherwise I will add to TODOs

@BeeverFeever
Copy link
Contributor Author

Yeah sure, I can do the ivec's.

@gottfriedleibniz
Copy link

return glmm_fmadd(vnegq_f32(c), a, b);

Should this not be glmm_fmadd(a, b, vnegq_f32(c))?

@recp
Copy link
Owner

recp commented Dec 12, 2023

AVX/FMA:

__m128 _mm_fmadd_ps (__m128 a, __m128 b, __m128 c)
dst[i+31:i] := (a[i+31:i] * b[i+31:i]) + c[i+31:i]
static inline
__m128
glmm_fmadd(__m128 a, __m128 b, __m128 c) {
#ifdef __FMA__
  return _mm_fmadd_ps(a, b, c);
#else
  return _mm_add_ps(c, _mm_mul_ps(a, b));
#endif
}

static inline
__m128
glmm_fmsub(__m128 a, __m128 b, __m128 c) {
#ifdef __FMA__
  return _mm_fmsub_ps(a, b, c);
#else
  return _mm_sub_ps(_mm_mul_ps(a, b), c);
#endif
}

NEON:

vmlaq_f32  (float32x4_t a, float32x4_t b, float32x4_t c)
RESULT[I] = a[i] + (b[i] * c[i]) for i = 0 to 3
static inline
float32x4_t
glmm_fmadd(float32x4_t a, float32x4_t b, float32x4_t c) {
#if CGLM_ARM64
  return vfmaq_f32(c, a, b); /* why vfmaq_f32 is slower than vmlaq_f32 ??? */
#else
  return vmlaq_f32(c, a, b);
#endif
}

static inline
float32x4_t
glmm_fmsub(float32x4_t a, float32x4_t b, float32x4_t c) {
  return glmm_fmadd(vnegq_f32(c), a, b);
}

Wow, I thought glmm_fmadd as a + b * c as NEON recently but glmm_fmadd works as AVX/FMA: a * b + c. In this case yes it should be glmm_fmadd(a, b, vnegq_f32(c)).

I'll add intents of operations of glmm_ funs as documentation for the future, TODOs.

@gottfriedleibniz many thanks for helping to fix things

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants