Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qa_volk_32f_s32f_convertpuppet_8u is flaky #646

Closed
argilo opened this issue Oct 13, 2023 · 6 comments · Fixed by #647
Closed

qa_volk_32f_s32f_convertpuppet_8u is flaky #646

argilo opened this issue Oct 13, 2023 · 6 comments · Fixed by #647

Comments

@argilo
Copy link
Member

argilo commented Oct 13, 2023

CI often fails, and the culprit seems to be qa_volk_32f_s32f_convertpuppet_8u. This was added in #617.

/cc @daniestevez

The failure can be demonstrated locally:

$ while ctest -R convertpuppet --output-on-failure; do :; done
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.04 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.05 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.04 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...   Passed    0.03 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.03 sec
Test project /home/argilo/git/volk/build
    Start 49: qa_volk_32f_s32f_convertpuppet_8u
1/1 Test #49: qa_volk_32f_s32f_convertpuppet_8u ...***Failed    0.03 sec
RUN_VOLK_TESTS: volk_32f_s32f_convertpuppet_8u(131071,1)
generic completed in 2.02759 ms
u_avx2_fma completed in 0.161178 ms
a_avx2_fma completed in 0.160147 ms
u_avx2 completed in 0.16983 ms
a_avx2 completed in 0.169176 ms
u_sse2 completed in 0.273428 ms
a_sse2 completed in 0.273223 ms
u_sse completed in 0.615186 ms
a_sse completed in 0.613146 ms
offset 564 in1: 110 in2: 111 tolerance was: 0
volk_32f_s32f_convertpuppet_8u: fail on arch u_avx2_fma
offset 564 in1: 110 in2: 111 tolerance was: 0
volk_32f_s32f_convertpuppet_8u: fail on arch a_avx2_fma
Best aligned arch: a_avx2_fma
Best unaligned arch: u_avx2_fma


0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.03 sec

The following tests FAILED:
	 49 - qa_volk_32f_s32f_convertpuppet_8u (Failed)
Errors while running CTest
@daniestevez
Copy link
Member

daniestevez commented Oct 13, 2023

Hmm. I'll try to look into this in more detail, but the first point I would start at is probably #617 (comment)

Edit: It would be nice to be able to see the input vector when the test fails. I suspect that this is caused by values that come out very close to a half integer (and so they might differ in rounding).

@argilo
Copy link
Member Author

argilo commented Oct 13, 2023

I added a bit of extra debug logging, and it does appear to be rounding differences that are causing the failures.

offset 25788 in: 0.230887 in1: 204 in2: 203 tolerance was: 0
volk_32f_s32f_convertpuppet_8u: fail on arch u_avx2_fma
offset 25788 in: 0.230887 in1: 204 in2: 203 tolerance was: 0
volk_32f_s32f_convertpuppet_8u: fail on arch a_avx2_fma

$ python3
>>> 0.230887 * 327.0 + 128
203.500049

327.0 is the default scalar value used in tests:

lv_32fc_t def_scalar = 327.0;

@argilo
Copy link
Member Author

argilo commented Oct 13, 2023

The test framework is very rigid, so the only workable fix I can see is to increase the tolerance to 1. I see you already suggested that here: #617 (comment)

@argilo
Copy link
Member Author

argilo commented Oct 13, 2023

PR: #647

@daniestevez
Copy link
Member

The debugging you did shows that I was on the right track in the original PR regarding rounding of values close to a half integer. The reason that FMA works differently than non-FMA kind of makes sense in hardware, because maybe for FMA you are able to carry more precision internally before doing the add, or need a different kind of rounding to meet performance, or whatever. I also have the impression that this does not behave the same in all machines, because on my AMD machine I could not make it fail. I'll try once again with your loop just to make sure.

@daniestevez
Copy link
Member

I've run the test loop for 20 minutes in my AMD machine and it never failed. It's a Ryzen 7 5800X. So apparently there's not even a spec about how the FMA rounding should behave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants