You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to track down a bug causing high CPU usage when using the JACK backend. When running librespot with identical options and only changing the backend from ALSA to JACK, the process goes from ~20% CPU usage to ~50% CPU while playing music on both my Pi 2 and Pi 3 running Raspbian. This increase is seen on the exact same binary (just changing the backend with the command line). For a non-Pi reference, my SoCFPGA development board with a dual-core Cortex A9 (Cyclone 5) runs librespot at under 10% CPU usage with the JACK backend. A9s are faster, but not that much. As JACK runs with all samples as single-precision floats by default, I have the theory that the difference is due to an improper usage of the Cortex A53's floating point hardware, as the main jackd process and a custom IIR filterbank engine for REW are both running with very low CPU usage despite performing a not insignificant number of FLOPS (20 biquads per channel + mixing) resulting in under 20% CPU usage combined according to top. It doesn't make any sense that the JACK backend of librespot would use significantly more than that.
I have mostly been toying with compiler flags. I make sure to compile the JACK code with: -mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits
for both C and C++. I am much less familiar with Rust, but I'm very confident I'm correctly passing floating point optimizations for the armv7-unknown-linux-gnueabihf target: RUSTFLAGS="-C target-cpu=cortex-a53 -C target-feature=+v8,+vfp4,+neon,-d16"
(the -d16 is to resolve an issue with LLVM register allocation with NEON enabled, this reassures me that NEON support is actually in the binaries).
I'm very curious to hear if anyone has been troubleshooting or has resolved a similar issue. If you want to test, feel free to clone my raspotify fork and build my docker image. On my machine, it's just a docker build . and then a docker run -it <image>. Just mark the build.sh as executable, run it, and get a coffee. The binary is built against the version of jack2 in the Raspbian repos so just make sure you have jackd2 installed if you're also testing on a Pi.
Sincerely appreciate any help!
EDIT: Also wanted to add that I've tried a native build (on the Pi 3 itself) and saw no changes
The text was updated successfully, but these errors were encountered:
Hm, librespot's code for Jack is much more complicated than ALSA thanks to it's callback mechanism, and Librespot's architecture.
Briefly glancing at the code it looks like the Jack back-end communicates with two 32bit float ports. It would be interesting to find out if it's possible to use a single interlaced stereo 16bit int port through to the Jack server - the format of decoded spotify audio. I'm not familiar enough with Jack to know how possible that is.
The alsa back-end is able to specify it's data format quite specifically and needs not do any conversion. I would be surprised if that wasn't a significant portion of the CPU difference you're seeing.
Please give my work at #660 a go. In the current dev branch there are two conversions going on, assuming you're using the lewton decoder. First lewton converts from f32 to i16, then jackaudio back from i16 to f32 again. My branch keeps everything in f32 without the back-and-forth conversions.
I'm trying to track down a bug causing high CPU usage when using the JACK backend. When running librespot with identical options and only changing the backend from ALSA to JACK, the process goes from ~20% CPU usage to ~50% CPU while playing music on both my Pi 2 and Pi 3 running Raspbian. This increase is seen on the exact same binary (just changing the backend with the command line). For a non-Pi reference, my SoCFPGA development board with a dual-core Cortex A9 (Cyclone 5) runs librespot at under 10% CPU usage with the JACK backend. A9s are faster, but not that much. As JACK runs with all samples as single-precision floats by default, I have the theory that the difference is due to an improper usage of the Cortex A53's floating point hardware, as the main jackd process and a custom IIR filterbank engine for REW are both running with very low CPU usage despite performing a not insignificant number of FLOPS (20 biquads per channel + mixing) resulting in under 20% CPU usage combined according to
top
. It doesn't make any sense that the JACK backend of librespot would use significantly more than that.I have mostly been toying with compiler flags. I make sure to compile the JACK code with:
-mcpu=cortex-a53 -mfloat-abi=hard -mfpu=neon-fp-armv8 -mneon-for-64bits
for both C and C++. I am much less familiar with Rust, but I'm very confident I'm correctly passing floating point optimizations for the armv7-unknown-linux-gnueabihf target:
RUSTFLAGS="-C target-cpu=cortex-a53 -C target-feature=+v8,+vfp4,+neon,-d16"
(the -d16 is to resolve an issue with LLVM register allocation with NEON enabled, this reassures me that NEON support is actually in the binaries).
I'm very curious to hear if anyone has been troubleshooting or has resolved a similar issue. If you want to test, feel free to clone my raspotify fork and build my docker image. On my machine, it's just a
docker build .
and then adocker run -it <image>
. Just mark the build.sh as executable, run it, and get a coffee. The binary is built against the version of jack2 in the Raspbian repos so just make sure you have jackd2 installed if you're also testing on a Pi.Sincerely appreciate any help!
EDIT: Also wanted to add that I've tried a native build (on the Pi 3 itself) and saw no changes
The text was updated successfully, but these errors were encountered: