Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brief drop outs from an M1 mac #1515

Closed
dasl- opened this issue Aug 18, 2022 · 16 comments
Closed

Brief drop outs from an M1 mac #1515

dasl- opened this issue Aug 18, 2022 · 16 comments

Comments

@dasl-
Copy link

dasl- commented Aug 18, 2022

I've found yet another instance of what I suspect is bugginess when connecting to SPS via an M1 mac (macOS 12.4, MacBook Air M1, 2020). I suspect that this is a bug with the M1 mac, rather than a bug with SPS. As such, I doubt that there is much you can do to address the bug. I'm just posting in case others come across the same issue and my report can be helpful to them.

My setup was: playing music on the Music app of my M1 mac, connected to SPS via the system sound output controls. Because I connected via the system sound output controls, SPS used a realtime audio stream rather than a buffered audio stream. Approximately every 10 minutes, there would be a brief (1-2 second) drop out in the music. The logs would say:

Aug 18 04:10:07 piwall10 shairport-sync:        725.989287771 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 04:10:07 piwall10 shairport-sync:          0.002138370 "player.c:2696" Large positive sync error of 77853 frames (1.765374 seconds), at frame: 443252387.

More logs demonstrating this occurring roughly every 10 minutes (sometimes it would also occur outside of the regular 10 minute intervals though):

Aug 18 03:01:47 piwall10 shairport-sync:        625.405392701 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 03:01:47 piwall10 shairport-sync:          0.000827334 "player.c:2696" Large positive sync error of 76300 frames (1.730159 seconds), at frame: 925027323.
Aug 18 03:10:38 piwall10 shairport-sync:        530.990988479 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 03:10:38 piwall10 shairport-sync:          0.001251889 "player.c:2696" Large positive sync error of 77841 frames (1.765102 seconds), at frame: 948442363.
...
Aug 18 03:20:06 piwall10 shairport-sync:          6.000308774 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 03:20:06 piwall10 shairport-sync:          0.000741037 "player.c:2696" Large positive sync error of 74994 frames (1.700544 seconds), at frame: 1935627896.

Another commonality is the sync errors often mention a duration of ~1.7 seconds (although again, this is not always the case).

One strange aspect of this bug was that when I connected my M1 mac in the same manner to a different SPS instance running on another raspberry pi, I could not reproduce the drop outs. If the bug was caused by the M1 mac, I might expect my laptop to have these drop outs no matter which SPS instance it was connected to.

A restart of my M1 mac solved the issue. This was the first time I've observed this particular issue happening, so at least it appears to be a relatively rare issue to encounter.

Software versions:

% nqptp -V ; shairport-sync -V
Version: 1.1-dev-171-ga07e9cb. Shared Memory Interface Version: 7.
4.1-dev-499-g0b2ebd25-AirPlay2-OpenSSL-Avahi-ALSA-soxr-metadata-dbus-sysconfdir:/etc

SPS running on raspberry pi 4. Config file:

general =
{
  // More info about volume on pis:
  // https://github.com/dasl-/piwall2/blob/d357c3766979d473f8135448ccf36935a4fa608a/piwall2/volumecontroller.py#L22
  volume_range_db = 40; // make volume line up approximately with my own logarithmic volume algorithm on pis
  volume_max_db = 0.0; // prevent clipping on raspberry pis
  name = "piwall";
};

alsa =
{
  output_device = "hw:Headphones";
  mixer_control_name = "Headphone";
};

diagnostics =
{
  log_verbosity = 1; // "0" means no debug verbosity, "3" is most verbose.
};
@dasl-
Copy link
Author

dasl- commented Aug 18, 2022

I'm realizing that this may be a very similar issue to the other M1 related issue I posted about previously: #1496

One big difference though is that in the previous issue, the drop outs were frequent enough that they could be called "stuttering" (i.e. every few seconds). Whereas in this issue, the drop outs are every ~10 minutes.

@dasl-
Copy link
Author

dasl- commented Aug 18, 2022

I also realized there is a macOS update available. I've just upgraded my M1 mac from 12.4 to 12.5.1 (21G83). I'll see if that helps.

@dasl-
Copy link
Author

dasl- commented Aug 18, 2022

I've just started noticing similar drop outs coming from an intel mac: MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports), macOS 12.4 (21F79).

Playing Pandora music via a firefox tab, connected to SPS via system sound output menu (realtime stream).

Logs:

Aug 18 19:01:30 piwall10 shairport-sync:        173.005695656 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 19:01:30 piwall10 shairport-sync:          0.000716815 "player.c:2696" Large positive sync error of 76553 frames (1.735896 seconds), at frame: 574858431.
Aug 18 19:02:33 piwall10 shairport-sync:         63.228813777 "rtsp.c:3100" Connection 10. AP2 Realtime Audio Stream.
Aug 18 19:05:07 piwall10 shairport-sync:        154.019083501 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 19:05:07 piwall10 shairport-sync:          0.004148963 "player.c:2696" Large positive sync error of 79497 frames (1.802653 seconds), at frame: 551499315.
Aug 18 19:08:23 piwall10 shairport-sync:        195.432440568 "rtsp.c:3100" Connection 10. AP2 Realtime Audio Stream.
Aug 18 19:15:09 piwall10 shairport-sync:        406.023850110 "rtsp.c:3100" Connection 10. AP2 Realtime Audio Stream.
Aug 18 19:16:52 piwall10 shairport-sync:        103.020100994 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 19:16:52 piwall10 shairport-sync:          0.000983056 "player.c:2696" Large positive sync error of 78170 frames (1.772562 seconds), at frame: 525788371.
Aug 18 19:17:33 piwall10 shairport-sync:         40.978344253 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 19:17:33 piwall10 shairport-sync:          0.000958686 "player.c:2696" Large positive sync error of 79266 frames (1.797415 seconds), at frame: 527594483.
Aug 18 19:18:12 piwall10 shairport-sync:         39.740519198 "rtsp.c:3100" Connection 10. AP2 Realtime Audio Stream.
Aug 18 19:22:18 piwall10 shairport-sync:        245.414858243 "rtsp.c:3100" Connection 10. AP2 Realtime Audio Stream.
Aug 18 19:24:19 piwall10 shairport-sync:        121.019759076 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 19:24:19 piwall10 shairport-sync:          0.002844722 "player.c:2696" Large positive sync error of 79463 frames (1.801882 seconds), at frame: 439525996.
Aug 18 19:25:58 piwall10 shairport-sync:         98.691188274 "rtsp.c:3100" Connection 10. AP2 Realtime Audio Stream.

I restarted my intel macbook, and after a single recurrence of the drop out, it did not recur. Logs after the restart:

Aug 18 19:31:26 piwall10 shairport-sync:        105.922877511 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:32:59 piwall10 shairport-sync:         92.997611815 "audio_alsa.c:1774" alsa: recovering from a previous underrun.
Aug 18 19:32:59 piwall10 shairport-sync:          0.000989463 "player.c:2696" Large positive sync error of 77623 frames (1.760159 seconds), at frame: 1289149059.
Aug 18 19:36:54 piwall10 shairport-sync:        234.795173379 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:36:56 piwall10 shairport-sync:          1.700582406 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:40:57 piwall10 shairport-sync:        241.658168430 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:43:45 piwall10 shairport-sync:        167.356678882 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:48:01 piwall10 shairport-sync:        256.830240643 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:51:34 piwall10 shairport-sync:        213.016208021 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.
Aug 18 19:51:36 piwall10 shairport-sync:          1.613034647 "rtsp.c:3100" Connection 11. AP2 Realtime Audio Stream.

Makes me wonder if airplay bugginess is present in intel as well as M1 macbooks.

@mikebrady
Copy link
Owner

Thanks for the information. It's really hard to know where the problems may lie -- the clients, the network, general delay on the Internet, Shairport Sync itself... so I guess that the best thing is to keep monitoring it, see if there is anything definite we can pin down...

@dasl-
Copy link
Author

dasl- commented Aug 19, 2022

so I guess that the best thing is to keep monitoring it, see if there is anything definite we can pin down...

Agreed, I'm going to see if I can find any patterns in the long run. I may post updates in this thread from time to time, but feel free to ignore me until I have something more conclusive :)

FWIW, connecting to SPS via iOS (whether in streaming or buffered mode) seems more reliable than connecting via macOS devices in my experience.

@dasl-
Copy link
Author

dasl- commented Aug 30, 2022

Hi again, I've been testing various configurations for the past week to attempt to isolate the issue. I've determined that this issue only occurs when the raspberry pi is connected via ethernet. When it's connected via wifi, I don't get these drop outs.

I tested on a new raspberry pi, SD card, and ethernet cable as well, so I don't think it's that I got a "dud".

Based on all the tests I ran, I believe the problem must lie in one of three places:

  1. My router isn't handling ethernet traffic properly. I don't have a spare router to test with unfortunately.
  2. The raspberry pi hardware / NIC driver / kernel has some bug in handling ethernet traffic
  3. SPS has a bug handling ethernet traffic. This seems unlikely... I expect SPS probably doesn't even know if it's connected via wifi or ethernet

Here are two full logs with log level 2 and statistics enabled that show the dropouts occurring in case you are interested to see (cmd + F for Large positive sync error):

  1. https://gist.githubusercontent.com/dasl-/c99cc95d8a993493f98ef6c1b8d87534/raw/6df508ce2441b4140e4f3ae9f090c196dc9e9115/gistfile1.txt
  2. https://gist.githubusercontent.com/dasl-/780b733993d8938a37ba2407652734ac/raw/ba784214c86c6d449dfb8591d0df9e67ace907ea/gistfile1.txt

When these logs were recorded, I was connecting to SPS from an intel macbook client. The macbook was connected to my network via wifi. I was streaming pandora music from a firefox tab. The macbook was connected to SPS via the system-wide audio output setting. SPS was running on a raspberry pi that was connected to my network via ethernet cable.

software versions:

% shairport-sync -V ; nqptp -V
4.1-dev-515-g3354f66c-AirPlay2-OpenSSL-Avahi-ALSA-soxr-metadata-dbus-sysconfdir:/etc
Version: 1.1-dev-171-ga07e9cb. Shared Memory Interface Version: 7.

hardware:

% cat /sys/firmware/devicetree/base/model
Raspberry Pi 4 Model B Rev 1.2

MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports), macOS 12.4 (21F79)

@mikebrady
Copy link
Owner

mikebrady commented Aug 30, 2022

Wow, thank you so much for all this, which I will study in the morning.

@mikebrady
Copy link
Owner

mikebrady commented Aug 31, 2022

Okay, so there is a fault in Shairport Sync.

In a Realtime Stream, packets of timing information are sent roughly every two seconds. If a packet has not been received for more than three seconds, Shairport Sync works on the basis that the sender has disappeared, at least temporarily. (This happens, for example, if you shut a Mac -- putting it to sleep -- while it is playing a YouTube video and the Mac is not plugged in to a power source.) From your experience, it seems that three seconds is too short, since packets can occasionally be lost in "normal" operation.

So the wait time has been changed from three to seven seconds, giving the opportunity for at least three packets to be received before concluding the sender is gone. (The idea of the code is to detect that the sender has disappeared, and when that happens, it is normally for at least tens of seconds; from that perspective, seven seconds should be fine.)

I've just pushed an update on the development branch which makes this change and cleans up a few other small things.

If you got a chance to try it and report back, it would be appreciated.

Super thanks for your really thorough and very helpful investigation!

@mikebrady mikebrady mentioned this issue Aug 31, 2022
1 task
@dasl-
Copy link
Author

dasl- commented Aug 31, 2022

Very interesting! I am currently testing with your latest changes. So far so good, but I'll keep testing for a few days before declaring victory :). Sometimes it takes a while for the issue to occur.

I am now curious why my network might be losing these timing packets. Looks like they are sent over UDP. I may try to run a packet capture on both ends (client and server) to see if anything unusual happens when these drop outs occur. Netstat does not show any UDP receive buffer errors on my pi...

@mikebrady
Copy link
Owner

We live in hope 🤞. Yeah, they are UDP packets, so it was foolish of me to wait for only one missing packet, TBH. The other thing, of course, is that the packets might not be lost but just delayed by a little more than one second and allowing the three-second timeout to mature.

@dasl-
Copy link
Author

dasl- commented Sep 1, 2022

Just for fun, I decided to rebuild SPS before your recent fix and get a packet capture of what was happening on the network during the audio drop outs.

TLDR: there were two drop out occurrences while I was packet capturing. The first was caused by a misbehaving client, and the second was caused by packets being dropped somewhere in the network.

Software versions:

% shairport-sync -V ; nqptp -V
4.1-dev-515-g3354f66c-AirPlay2-OpenSSL-Avahi-ALSA-soxr-metadata-dbus-sysconfdir:/etc
Version: 1.1-dev-171-ga07e9cb. Shared Memory Interface Version: 7.

During my packet capture, I got 2 occurrences of the drop out.

Note that in my write up here, all timestamps in the logs are in UTC (logs from the pi). But all timestamps in the wireshark screenshots are in ET. So there's a 4 hour time difference.

First drop out occurrence

Logs: https://gist.github.com/dasl-/5f3cc790e2dd89ec7f3c5a5a2b85b846#file-gistfile1-txt-L6

Packet capture on client (macbook):
ssss

We can see at the time of the drop out in the logs, there was a gap of just over 3 seconds in the client sending timing packets. So a "misbehaving" client may have caused this drop out.

Packet capture on the server (pi) also sees a gap of over 3 seconds in between timing packets, as expected:
Screen Shot 2022-08-31 at 9 15 48 PM

Second drop out occurrence

Logs: https://gist.github.com/dasl-/72ad4261d0de30202d035fbcae4292e6#file-gistfile1-txt-L8

This time, let's look at the packet capture on the server (pi) first. Packet capture on the server:
Screen Shot 2022-08-31 at 9 22 46 PM

We can see at the time of the drop out in the logs, there was a gap of just over 4 seconds in between timing packets, as recorded by the server.

Next, let's look at the packet capture on the client (macbook):
Screen Shot 2022-08-31 at 9 31 20 PM

This time, it appears the cause of the drop out is different: the client appears to have sent a timing packet that the server never received. I guess it got lost in the network somehow.

The server does not appear to have any UDP receive buffer errors:

% netstat -anus
...
Udp:
    613261 packets received
    209 packets to unknown port received
    0 packet receive errors
    1548 packets sent
    0 receive buffer errors
    0 send buffer errors
    IgnoredMulti: 1375
...

I'm less familiar with the version of netstat on the client (macOS) - there does not appear to be a heading for send buffer errors in the datagrams output section. Although the datagrams received section does have a the entry 26 dropped due to full socket buffers, but I'm not sure if that only applies to send buffer errors or if it counts receive buffer errors as well. Here's the client output:

% netstat -p udp -ans
udp:
	22593092 datagrams received
		0 with incomplete header
		0 with bad data length field
		0 with bad checksum
		668 with no checksum
		34970 checksummed in software
			27376 datagrams (4123194 bytes) over IPv4
			7594 datagrams (1562597 bytes) over IPv6
		28080 dropped due to no socket
		26271 broadcast/multicast datagrams undelivered
		0 time multicast source filter matched
		26 dropped due to full socket buffers
		0 not for hashed pcb
		22538715 delivered
	57379462 datagrams output
		85188 checksummed in software
			31281 datagrams (3303953 bytes) over IPv4
			53907 datagrams (17537226 bytes) over IPv6

So, the cause of the packets lost in the network remains a mystery... I'm guessing it could easily be wifi though. The client is connected via wifi, and the server is connected via ethernet cable.

I'm really happy we seem to have figured out what was going on here! Thank you for investigating! I'll continue running SPS with your recent fixes for a few days and then I'll report back whether everything appears to be fixed :)

@mikebrady
Copy link
Owner

Thanks for the writeup. If I'm not wrong, the first example shows a packet being send a second or so early and then leaving a corresponding gap of over three seconds before the subsequent one -- very interesting.

I have learned a lot, thanks.

@danhansen
Copy link

@dasl-

I've been having similar issues, and recently discovered that the ethernet power saving features on the rpi4 may be causing my grief:

raspberrypi/linux#3292 (comment)

I'm going to try this workaround, might be worth a try in your case as well...

@dasl-
Copy link
Author

dasl- commented Sep 7, 2022

Interesting, I had not heard of the ethernet powersave feature before. On my pi, I see this by default:

% sudo ethtool --show-eee eth0
EEE Settings for eth0:
	EEE status: enabled - inactive
	Tx LPI: disabled
	Supported EEE link modes:  100baseT/Full
	                           1000baseT/Full
	Advertised EEE link modes:  100baseT/Full
	                            1000baseT/Full
	Link partner advertised EEE link modes:  Not reported

After running sudo ethtool --set-eee eth0 eee off, I see it change to say:

% sudo ethtool --show-eee eth0
EEE Settings for eth0:
	EEE status: disabled
	Tx LPI: disabled
	Supported EEE link modes:  100baseT/Full
	                           1000baseT/Full
	Advertised EEE link modes:  Not reported
	Link partner advertised EEE link modes:  Not reported

Since it originally showed enabled - inactive, I wonder if EEE was impacting me? In this forum post, it is mentioned that it should say enabled - active if EEE is in effect: https://forums.raspberrypi.com/viewtopic.php?t=305820#p1831032 . But it's hard to know how trustworthy random internet sources are though. I'm curious if you see similar on your pi.

UPDATE: it seems that when my network path is router -> ethernet cable -> switch -> ethernet cable -> raspberry pi, then the output says: EEE status: enabled - inactive. But when the signal path is router -> ethernet cable -> raspberry pi, then the output says: EEE status: enabled - active. So maybe the switch in my network path doesn't support EEE?

In any case, I just tested by downgrading to before Mike's recent fix for this issue. In particular, I downgraded to building shairport-sync on the development branch at this commit: 3354f66

Here are the exact versions I was running after the downgrade:

% shairport-sync -V ; nqptp -V
4.1-dev-515-g3354f66c-AirPlay2-OpenSSL-Avahi-ALSA-soxr-metadata-dbus-sysconfdir:/etc
Version: 1.1-dev-171-ga07e9cb. Shared Memory Interface Version: 7.

I then played some music through SPS for a while, and observed occasional drop outs as expected. Logs from this period: https://gist.github.com/dasl-/194c387bf17711b208e48e9bd326f33b

Next, I ran sudo ethtool --set-eee eth0 eee off to disable ethernet power saving. I continued playing music through SPS for a while, and I continued to observe occasional drop outs. Logs from this period: https://gist.github.com/dasl-/092718c5f6cd04d0005a77e56c9a6df6

So it seems disabling ethernet powersave did not help. That said, this is not totally unexpected. In my earlier testing, I found that sometimes the drop outs were caused by a "misbehaving client", and other times they were caused by timing packets getting lost somewhere in the network. I think ethernet powersave could potentially explain the latter case, but not the former. So I wouldn't expect turning off ethernet powersave to fully solve this issue. That said, it's also a mystery to me why I didn't notice this issue more when the pi was connected via wifi. A misbehaving client should cause drop outs no matter whether the pi was connected to the network via wifi or ethernet.

I believe that Mike's fix has fully solved the issue for me. In fact I was planning to close this issue soon. Did the fix not solve your drop outs?

@danhansen
Copy link

Alas, I think my issue is distinct, because it appears to be happening at least with buffered audio streams:

shairport-sync_1  | 2022-09-07T18:19:09.818838000Z       2801.529248671 "../rtsp.c:4174" Connection 1: Current volume (-24.000000) requested
shairport-sync_1  | 2022-09-07T18:19:18.575148000Z          8.762317966 "../rtsp.c:3154" Connection 1. AP2 Buffered Audio Stream.
shairport-sync_1  | 2022-09-07T18:19:19.650962000Z          1.074114583 "../player.c:1119" Gone past starting time for 3493315618 by 9080669 nanoseconds.
shairport-sync_1  | 2022-09-07T18:19:19.681689000Z          0.031986718 "../player.c:2652" Connection 1: Playback Started -- AirPlay 2 Buffered.
shairport-sync_1  | 2022-09-07T18:19:19.860555000Z          0.178166719 "../player.c:2692" Large positive sync error of 3813 frames (0.086463 seconds), at frame: 3493333218.
shairport-sync_1  | 2022-09-07T18:19:25.512513000Z          5.651754477 "../player.c:1119" Gone past starting time for 2819708795 by 22119238 nanoseconds.
shairport-sync_1  | 2022-09-07T18:19:25.515913000Z          0.000446094 "../player.c:2547" Delay error 19 when checking running latency.
shairport-sync_1  | 2022-09-07T18:19:25.543577000Z          0.030931927 "../player.c:2692" Large positive sync error of 2292 frames (0.051973 seconds), at frame: 2819710203.
shairport-sync_1  | 2022-09-07T18:19:46.076969000Z         20.532790357 "../player.c:1650" Connection 1: Playback Stopped. Total playing time 00:00:26.
shairport-sync_1  | 2022-09-07T18:19:48.469690000Z          2.393828072 "../rtsp.c:4174" Connection 3: Current volume (-14.625000) requested
shairport-sync_1  | 2022-09-07T18:19:50.700725000Z          2.230532551 "../rtsp.c:3154" Connection 3. AP2 Buffered Audio Stream.
shairport-sync_1  | 2022-09-07T18:19:51.400259000Z          0.698892656 "../player.c:2652" Connection 3: Playback Started -- AirPlay 2 Buffered.
shairport-sync_1  | 2022-09-07T18:19:52.653727000Z          1.253770624 "../rtp.c:1316" Connection 3: Note: anchor parameters have changed before clock b41bb0ac45e30008 has stabilised.
shairport-sync_1  | 2022-09-07T18:19:57.184996000Z          4.530581978 "../player.c:2692" Large positive sync error of 2223 frames (0.050408 seconds), at frame: 3594226927.
shairport-sync_1  | 2022-09-07T18:20:07.452469000Z         10.267546767 "../player.c:1650" Connection 3: Playback Stopped. Total playing time 00:00:16.

What is particularly strange is that the drop out occurs while playing a song from Apple Music, is repeatable when playing that particular song, and occurs at the same point during the playback of the song. It looks like I'll need to open a separate issue.

@dasl-
Copy link
Author

dasl- commented Sep 12, 2022

I'm closing this issue as I've not had the drop outs recur in the same manner since Mike's fix: #1515 (comment)

a617e60

@dasl- dasl- closed this as completed Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants