Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having trouble sending audio and video at the same time on aarch64 systems #28

Open
samiamlabs opened this issue Jul 30, 2021 · 4 comments

Comments

@samiamlabs
Copy link

Hi @dkumor!

I've been working on a ROS2-based telepresence robot for a couple of weeks now using this really convenient and nicely abstracted library. Everything works great out of the box on my laptop but I'm having some trouble getting bi-directional audio and video to work on more limited aarch64 systems like raspberry pi 4 or Jetson Xavier.

The issue I'm currently having is with sending audio (it is choppy and increasingly behind as time passes). It seems to be caused by the recv function in _audioSenderTrack slowing down from getting called at about 60Hz to about 30Hz when I try to send video at the same time (from CVCamera(width=848, height=480) in this case). This is enough for it to no longer be able to keep up with the data generated from the microphone.

Have you run into this issue yourself?

I used log_slow_callbacks.enable(0.01) to try to figure out what was slowing the asyncio event_loop. I guess an execution time of 0.029 for _run_rtp() could throttle the loop at around 30Hz if everything gets called with the same priority and it tries to run at around 30Hz itself? (Not very familiar with asyncio so my intuitions could be really off here...)
ARNING:aiodebug.log_slow_callbacks:Executing <Task pending name='Task-47' coro=<RTCRtpSender._run_rtp() running at /opt/overlay_ws/container_workspace_files/forked_python_deps/aiortc/src/aiortc/rtcrtpsender.py:295> wait_for=<Task pending name='Task-1012' coro=<NoClosedSubscription.get() running at /opt/overlay_ws/container_workspace_files/forked_python_deps/rtcbot/rtcbot/base/base.py:26> cb=[<TaskWakeupMethWrapper object at 0x7f402e4b20>()]>> took 0.029 seconds

@dkumor
Copy link
Owner

dkumor commented Jul 31, 2021

Yikes! It looks like the CPU isn’t keeping up there! As a whole, asyncio runs all callbacks in the same thread, so all messaging as well as video/audio shuffling is happening on a single core. This is unfortunately one of those problems that doesn’t really have a good solution. Here are a couple things that you can try to see what the underlying issue is/work around the problem:

  • Try lowering the video resolution, to see how low you need to make it before the system starts to keep up.
  • It was a while since I tried it, but you might be able to lower the sample rate of the microphone, and then also lower the sample rate of the sending connection. This would be done by calling Microphone(samplerate=22050) to set up the microphone, and then conn.audio.addTrack(samplerate=22050) before you add the audio stream to the connection, for the connection to use the lower sampling rate.
  • What’s the CPU usage on the Xavier/Pi? Currently all video encoding/decoding is done on the CPU, so it is possible that the software video encoding/decoding at the same time is a bit more than can be handled at that resolution. This is an ongoing problem that I hope might be solved in the future with hardware encoding/decoding - but that doesn’t help you now!
  • Finally, you might get a bit of a free performance boost by using uvloop

I have run into similar issues on the Pi, and attributed them to the software video encoding/decoding - but it is possible that the issue is actually the in-thread processing done by aiortc when sending/receiving large frames. In my own tests, audio/video was unidirectional, so it was a bit easier for the system to handle.

@samiamlabs
Copy link
Author

samiamlabs commented Jul 31, 2021

Try lowering the video resolution, to see how low you need to make it before the system starts to keep up.

I tried lowering it but it did not seem to make much of a difference. I also disabled the robot from receiving video and audio without and that did not help either

You might be able to lower the sample rate of the microphone

Lowered it to 16000. Seemed to work and maybe helped a little, but I'm still not able to transmit audio and video at the same time.

What’s the CPU usage on the Xavier/Pi?

I don't have the rpi4 up and running at the moment but on the Jetson it's like this with 480p video and 16000 audio being transmitted:
image
So none of the cores are at 100%. (I'm also running nomachine and vscode-server on the Jetson so much of the cycles are used for that.)

Finally, you might get a bit of a free performance boost by using uvloop

I already tried to use uvloop but It seems to stop working when I add asyncio.set_event_loop_policy(uvloop.EventLoopPolicy()). I suspect it is not supported by aiortc or something in rtcbot? Or have you gotten it to work before?

I have run into similar issues on the Pi, and attributed them to the software video encoding/decoding

Yea, I thought that too at first and spend a bunch of time trying to get FFmpeg hardware acceleration in aiortc to work.
Aiortc actually supports hardware-accelerated video encoding using the omx encoder in FFmpeg for raspberry with 32-bit OS now as far as I can tell (but have not tested it). Is pretty easy to integrate with rtcbot following this example aiortc/aiortc#502

Unfortunately, the omx encoder is considered deprecated and won't be supported on 64-bit Ubuntu for raspberry pi 4.
I need arm64 for ROS2, so that was not an option for me. The encoder that is supposed to replace it segfaults when used through aiortc (see this issue: PyAV-Org/PyAV#798)

I moved on to Jetson after that and tried to use the Nvidia encoder through FFmpeg. Went through quite a lot of trouble building a patched FFmpeg that supports nvenc. It seemed to work, but I ended up with a pretty large delay in the output video on my website. I suspect I need to configure the codec somehow so I tried to ask around but got no answers as to how yet... (jocover/jetson-ffmpeg#91)

The codecs don't run on the main thread with asyncio as far as I can tell, but it is possible that they could still affect things but they are probably not the root cause for this issue.

This is unfortunately one of those problems that doesn’t really have a good solution.

Software running on robots like localisation and computer vision have a tendency to max out the processors, so I really think we need to find a solution that is robust against a slow-running main thread.

Scaling the sample rate and audio packetization time based on how fast the loop is able to process callbacks seems like the only solution if we want to keep this architecture mostly as-is. Changing the sample rate alone does not seem to be enough as far as I can tell. I tried to change the sample parameter in RebatchSubscription to something else than AUDIO_PTIME * sampleRate but have not had any luck getting that to work yet.
Do you know if aiortc only handles exactly "20ms audio packetization" for some reason? I could just be doing something wrong here also....

I really don't want to give up and use a NUC or something, so I'm hoping this is fixable :)

@dkumor
Copy link
Owner

dkumor commented Aug 1, 2021

Wow, looks like you were very thorough in your debugging of the issue!

Unfortunately, I am away from home for the next several weeks, and don’t have a Pi to test things with me. All of the cores being at >60% suggests that something is really maxing out CPU usage. I wonder if it would somehow be possible to pin the asyncio loop thread to a single core, and have video encoding/decoding based entirely on other cores, without overlap.

Without the ability to test this myself right now, I can only speculate, but based on what you said (the microphone being limited to 30hz callback rate), it does look like something in the video processing pipeline is causing the delay - the video is sent at 30Hz. This is also corroborated by the delay error you gave in your original post - as I understand it, when the _run_rtp() function returned control to the event loop, which through a series of calls ends up waiting on a task at rtcbot’s base.py, 0.029 seconds elapsed. The problem seems like it might be somewhere on the path _get_frame -> recv (in rtcbot tracks.py)-> frameSubscription.get() -> … -> base.py:26, or otherwise in the video sending loop of _run_rtp.

https://github.com/aiortc/aiortc/blob/d5d1d1f66c4c583a3d8ebf34f02d76bc77a6d137/src/aiortc/rtcrtpsender.py#L295

img = await self._frameSubscription.get()

If this delay is consistently showing up for each frame of video sent (not just once or twice), then I suspect that either there is a bug in rtcbot somewhere along the above path, or something is taking longer than expected. Or maybe even an issue with the GIL messing with asyncio performance, when the audio/video preparation threads in CVCamera and Microphone are processing data in the background…

As I mentioned before, I won’t be able to dive into this problem for several more weeks, but I would personally approach it by going through the await calls and seeing if there is any point where a non-asyncio thing is used, or where there is a large amount of processing (perhaps by printing out timestamps at each level).

Sorry I can’t be of more immediate help - if you don’t have time to play with this, please post as basic an example as possible which reproduces this issue (I will be debugging on the Pi 4), and I will get to it once I am back home!

As for 20ms in aiortc, I don’t remember why - but I do remember that 960 samples was the only thing that worked :/

@cHemingway
Copy link

I also have this issue on my Raspberry Pi 4, with the raspberry pi camera and a USB speakerphone for audio. Delays can be in the 10s of seconds.

Changing the sample rate didn't seem to help much, nor did reducing the camera resolution (though it did reduce resolution).

I could make a basic example if you are still interested?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants