Rewrite Xbox audio to use the default SDL audio thread #27

JayFoxRox · 2020-04-27T03:21:38Z

This is a rewrite of the crashy audio driver that was recently introduced.

This new code is run on a different thread, which lifts many restrictions from the SDL callback (although we potentially waste more CPU cycles on context switches now).

Initialization:

We create a semaphore which keeps track of how many buffers are ready to be queued by the mainloop. No buffers will qualify for that initially: we'll always reserve 1 buffer to be filled by the application, and the rest of the buffers will be queued with silence initially (this is to start consistent timing right away).
So we allocate all audio buffers and we queue them all, except the first one, which is reserved for the first application data. We remember the first buffer as the next buffer to be queued.
We then start playback.

To summarize: We start playing silence at the second buffer (and following), until playback wraps around and reaches the first buffer with the actual application data (which was hopefully queued by the mainloop by then). This means we force a couple of milliseconds of silence on application startup, and we do force a higher latency, but we get more headroom for buffer underruns and more consistent timing at startup.

During the mainloop:

We return the next buffer to be queued as device buffer, so it can be modified by the application.
When the application finishes writing the audio data, we queue that buffer and remember the following buffer as the one to be queued next.
SDL will then wait for more buffers to become available to continue the mainloop; this is done by looking at our semaphore.
Once the semaphore count is greater than 0 (= buffer available), it will decrement the count to reserve a buffer.

When a buffer finishes playback:

We simply increment the semaphore to mark another buffer as available (this leads to another mainloop iteration and the SDL callback being called).

Some remaining issues which should be looked at during review:

The driver was only tested in XQEMU, and not for long.
The buffer is still very short. So underruns could happen quickly. I did not test if the buffer size can be increased without breaking the driver (existing code uses 1 page per buffer and 0x400 samples for an individual buffer size of 0x1000, so page boundaries aren't an issue).
I was surprised that I had to queue the first audio chunk before XAudioPlay works; this is why the semaphore is initialized with zero (and some buffers are queued initially), unlike for other SDL audio drivers. I wonder if AC97 pauses when running out of buffers, and if there might be a risk of this happening (with underruns for example). I'll probably test this and create an issue on the nxdk repository so we can investigate this behavior with XAudio.
If two or more buffers run out too quickly around the same time, I wonder if we might only get a single interrupt. If this was the case, we'd quickly get broken audio or a full hang as buffers vanish. I'll probably create an issue on the nxdk repository so we can investigate this behavior with XAudio.
I did not test this on a debug kernel (I worry that this code might assert in the kernel, like previous AC97 code).
Buffer-memory is still write-combined, so reads from SDL callback could be bad for performance. I think this is fine as most people will write sequentially and I doubt reading is that bad on such a small buffer. We could remove this flag to have more guarantee-able performance but with lower maximum performance.

I'll create issues after merge, if we don't deal with them during review.

JayFoxRox · 2020-05-15T22:04:49Z

src/audio/xbox/SDL_xboxaudio.c

+XBOXAUDIO_PlayDevice(_THIS)
+{
+    /* Send samples to XAudio */
+    XAudioProvideSamples(_this->hidden->buffers[_this->hidden->next_buffer], _this->spec.size, FALSE);


This is write-combined memory, so this requires __asm__ __volatile__ ("sfence"); to flush the buffer probably.

I know that @thrimbor looked into this for the MCPX NIC and he found something about the device peeking for memory changes (is write-combined memory excluded from this maybe?). Would be interesting to know if he found similar comments about the MCPX ACI.

The NV2A actually requires a more complicated set of steps (click here to see code in pbkit)

Same issue exists in XBOXAUDIO_OpenDevice after making the memory silent.

Even if I removed the PAGE_WRITECOMBINE I'd not be sure how to best flush the cache so the ACI can see the new data. PAGE_NOCACHE would probably ruin the performance.

I checked this code now by doing this in the SDL callback before filling with actual data:

// Try to inject cache errors by making the signal in RAM constant memset(stream, rand() & 0xFF, len); __asm__ __volatile__ ("sfence"); #if defined(NXDK) __asm__ __volatile__ ("wbinvd"); #endif __asm__ __volatile__ ("sfence");

However, I never saw a flat signal in my output.

To doublecheck I also added this just before XAudioProvideSamples:

// Try to inject cache errors by making signal in CPU cache random uint8_t* data = _this->hidden->buffers[_this->hidden->next_buffer]; for(int i = 0; i < _this->spec.size; i++) { data[i] = rand() & 0xFF; }

However, I only saw random values and nothing of the signal (a clean sine wave) survived.

I have confirmed this for PAGE_READWRITE | PAGE_WRITECOMBINE and PAGE_READWRITE

So buffer updates without sfence or wbinvd still work absolutely fine.

Do I need to take any action?

The stuff I read about was cache snooping, done by the PCI host controller to check whether there's data in the CPU cache that's more recent than RAM when a device wants to do DMA reads (afaik it can also mark CPU cache data as invalid when the device does a DMA write).
I know it can behave differently for AGP (it is or can be disabled there for performance reasons), and I think it also doesn't work for write combining. I didn't find much good documentation about this, but I was able to find this MSDN entry which indicates that a memory barrier is enough to ensure proper operation.
I can't tell whether an sfence is enough or whether we'd require mfence, but I think a fence instruction should be added.

https://stackoverflow.com/questions/25985698/is-it-necessary-to-flush-write-combine-memory-explicitly-by-programmer cites some intel document (not sure if applicable), that sfence isn't even required when uncached memory is written (such as MMIO).

So unless we are updating an already submitted buffer, it shouldn't be necessary.

Not sure if I understand you correctly - do you mean that we don't need the explicit fence because the MMIO write in xaudio to update the descriptor will flush the WC buffer anyway? At least that's how I understand the excerpt from the Intel manuals, and I'd be fine with not having a fence here then.

do you mean that we don't need the explicit fence because the MMIO write in xaudio to update the descriptor will flush the WC buffer anyway?

Exactly. This is what I meant.

JayFoxRox · 2020-05-28T15:07:10Z

I have documented some of the issues in https://discord.com/channels/428359196719972353/428360618102226946/711290999091101766 on XboxDev Discord::

the highlighted part is broken. the high-part [32 bytes] is the end of an AC97 XAudioProvideSamples call [offset by 32 bytes to the right]. the flatline near 7.310 is the line between 2 separate SDL callbacks / buffer fills
I also tried increasing the number of buffers and the problem persisted

There are still some issues with this, but it's "better than master".
While the audio is still bad, it no longer appears to crash.

I feel like we should merge this after a quick review, and then someone will have to debug it.
I currently lack the tools to debug it properly and I have also stared at this problem for too long.

A good way to check if this might be a problem with math functions, would be to dump the generated audio to disk or network (I haven't done either of that yet).
A full log of events (interrupts, SDL callbacks, AC97 submissions) would also be helpful.

thrimbor

I've added some review comments.
Even though I'm generally unhappy about the audio issues we currently have, this all may very well be an xaudio problem, and I don't think it makes sense to delay merging this much more, especially since it fixes hangs (in fact it should've gotten merged much sooner).

I intend to merge as soon as the feedback is addressed.

src/audio/xbox/SDL_xboxaudio.c

thrimbor · 2020-06-05T01:17:36Z

src/audio/xbox/SDL_xboxaudio.c

+XBOXAUDIO_PlayDevice(_THIS)
+{
+    /* Send samples to XAudio */
+    XAudioProvideSamples(_this->hidden->buffers[_this->hidden->next_buffer], _this->spec.size, FALSE);


The stuff I read about was cache snooping, done by the PCI host controller to check whether there's data in the CPU cache that's more recent than RAM when a device wants to do DMA reads (afaik it can also mark CPU cache data as invalid when the device does a DMA write).
I know it can behave differently for AGP (it is or can be disabled there for performance reasons), and I think it also doesn't work for write combining. I didn't find much good documentation about this, but I was able to find this MSDN entry which indicates that a memory barrier is enough to ensure proper operation.
I can't tell whether an sfence is enough or whether we'd require mfence, but I think a fence instruction should be added.

JayFoxRox · 2020-06-09T03:04:27Z

@thrimbor I'd like to delay this a bit.

I'd like to wait for XboxDev/nxdk#364. That should resolve the issues with SDL audio.
But this will need testing, and might require changes to this SDL audio PR.

With the current XAudio design, it would have been impossible to implement SDL audio correctly.
I've explained some of the issues in the PR (originally on XboxDev Discord in https://discord.com/channels/428359196719972353/428360618102226946/719720606756634675, too).

While the current code will work with my planned XAudio changes, we might still have to increase the buffer size to support some commonly used formats, due to slow CPU conversion and stupid logic in SDL (it doesn't convert while waiting for buffer playback to finish, so we might need a longer buffer to give the CPU more time).

thrimbor · 2020-07-19T22:07:20Z

Since the XAudio changes have landed quite a while ago, I gave this another round of testing, and I'm going ahead with merging.

JayFoxRox force-pushed the audio-rewrite branch 5 times, most recently from c449ae6 to a043355 Compare April 27, 2020 16:11

audio: Rewrite Xbox audio to use the default SDL audio thread

a6e8fc9

JayFoxRox force-pushed the audio-rewrite branch from a043355 to a6e8fc9 Compare April 30, 2020 14:04

Ryzee119 mentioned this pull request May 2, 2020

Audio can cause a crash Ryzee119/SDLPoP#2

Closed

JayFoxRox commented May 15, 2020

View reviewed changes

thrimbor reviewed Jun 5, 2020

View reviewed changes

Ryzee119 mentioned this pull request Jun 9, 2020

Fix many issues in OpenXDK XAudio XboxDev/nxdk#364

Merged

thrimbor merged commit d5d814e into XboxDev:nxdk Jul 19, 2020

JayFoxRox deleted the audio-rewrite branch July 27, 2020 22:21

JayFoxRox mentioned this pull request Apr 12, 2021

SDL Audio driver is unstable and potentially hangs or crashes #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Xbox audio to use the default SDL audio thread #27

Rewrite Xbox audio to use the default SDL audio thread #27

JayFoxRox commented Apr 27, 2020 •

edited

Loading

JayFoxRox May 15, 2020

JayFoxRox May 16, 2020

thrimbor Jun 5, 2020

JayFoxRox Jun 9, 2020

thrimbor Jun 9, 2020

JayFoxRox Jun 9, 2020

JayFoxRox commented May 28, 2020

thrimbor left a comment

thrimbor Jun 5, 2020

JayFoxRox commented Jun 9, 2020 •

edited

Loading

thrimbor commented Jul 19, 2020

Rewrite Xbox audio to use the default SDL audio thread #27

Rewrite Xbox audio to use the default SDL audio thread #27

Conversation

JayFoxRox commented Apr 27, 2020 • edited Loading

JayFoxRox May 15, 2020

Choose a reason for hiding this comment

JayFoxRox May 16, 2020

Choose a reason for hiding this comment

thrimbor Jun 5, 2020

Choose a reason for hiding this comment

JayFoxRox Jun 9, 2020

Choose a reason for hiding this comment

thrimbor Jun 9, 2020

Choose a reason for hiding this comment

JayFoxRox Jun 9, 2020

Choose a reason for hiding this comment

JayFoxRox commented May 28, 2020

thrimbor left a comment

Choose a reason for hiding this comment

thrimbor Jun 5, 2020

Choose a reason for hiding this comment

JayFoxRox commented Jun 9, 2020 • edited Loading

thrimbor commented Jul 19, 2020

JayFoxRox commented Apr 27, 2020 •

edited

Loading

JayFoxRox commented Jun 9, 2020 •

edited

Loading