-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in fir_decimate_cc NEON implementation at higher decimation rates #15
Comments
After some more investigation I found that the NEON implementation outputs increasingly large values, then starts outputting -Inf and +Inf, and eventually outputs NaN after some time. |
Hi Rico, Thanks for finding that! That's definitely a bug. The FIR filter should be stable. 73! Andras |
Hi Andras, The RPi will use PARAMS_RASPI and therefore not the NEON implementation, right? Does the RPi even support NEON? 73, |
I developed those routines on the RPi 2. It does have NEON. Let's suppose that the problem happens when we write zero to the accumulators at the beginning of the loop: asm volatile(
" vmov.f32 q4, #0.0\n\t" //another way to null the accumulators
" vmov.f32 q5, #0.0\n\t" I've just read that using the VMOV.F32 this way is only supported if the ARM SoC has VFPv3. What is the output of |
Jup, it does. If the Raspi has NEON then why isn't it enabled for raspi in the Makefile? On do 29 sep. 2016 at 21:27, András Retzler notifications@github.com
|
Could it have sonething to do with the different amount of taps between the two implementations? |
It is. NEON is available in Raspberry Pi 2. |
Ah ok, my bad. I'm curious if it works on your Pi using a higher sample rate. Let me know if you want me to test anything on the Odroid. |
BTW, this is my cpuinfo, so the Oroid also has vfp and vfpv3. `rico@odroid:~$ cat /proc/cpuinfo processor : 1 processor : 2 processor : 3 Hardware : ODROID-U2/U3 |
I'm going to have a go at fixing it myself, should be a fun assembly exercise. Andras, could you explain why the NEON impl has a different amount of taps than the non-NEON impl? Is it because of alignment restrictions, and if so, what are they? |
OK, I'm quite sure it has something to do with the zeroing of the accumulators |
Great!
The NEON instructions used here work on 4 pieces of 32 bit floats at once (called a quad register), so both the number of taps and the number of input samples have to be multiples of 4. See my explanation of the algorithm over here: |
When I zero the accumulators in another (very indirect) way as suggested here using Does this have any drawbacks as far as you can see? |
…ors in fir_decimate_cc. Also removed some unused variables
I think it is a very good solution. |
Disregard that solution, it doesn't work. Still silent audio. Same when So, the zeroing is not the problem, but the filter is unstable for some reason. |
Next clue: When I use the same taps calculated for the NEON implementation with the non-NEON implementation the problem also occurs. So it seems that the NEON implementation itself is innocent, it is the filter design for the NEON impl that breaks. |
…ize in amount of taps
OK, I finally found and fixed the issue, see commit 12d7db8. I've created a pull request. |
I encountered this issue when trying to run OpenWebRX on an Odroid-U3. When running at a sample rate of 0.25 Msps everything works fine, while at a sample rate of 2.048 Msps the audio would drop out, even though the CPU usage was very low.
After some investigation this seems to be an issue with fir_decimate_cc, it started outputting zeros with higher decimation rates (185 at 2.048 Msps) after about 1 second of operation while it works fine with lower rates (22 at 0.25 Msps). When not using the NEON optimized implementation of fir_decimate_cc by removing -DNEON_OPTS in PARAMS_NEON this issue does not arise.
Without the optimized implementation the Odroid-U3 uses 22% CPU when sampling at 2.048 Msps with 1 client connected, so this workaround is (pretty much) usable.
This issue could of course be specific to the Odroid-U3, please report if anyone has the current NEON-implementation working using another board/cpu with NEON FPU, then the solution is to use the workaround when using an Odroid-U3. :)
The text was updated successfully, but these errors were encountered: