Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Xbox audio to use the default SDL audio thread #27

Merged
merged 1 commit into from
Jul 19, 2020

Conversation

JayFoxRox
Copy link
Member

@JayFoxRox JayFoxRox commented Apr 27, 2020

This is a rewrite of the crashy audio driver that was recently introduced.

This new code is run on a different thread, which lifts many restrictions from the SDL callback (although we potentially waste more CPU cycles on context switches now).

Initialization:

  • We create a semaphore which keeps track of how many buffers are ready to be queued by the mainloop. No buffers will qualify for that initially: we'll always reserve 1 buffer to be filled by the application, and the rest of the buffers will be queued with silence initially (this is to start consistent timing right away).
  • So we allocate all audio buffers and we queue them all, except the first one, which is reserved for the first application data. We remember the first buffer as the next buffer to be queued.
  • We then start playback.

To summarize: We start playing silence at the second buffer (and following), until playback wraps around and reaches the first buffer with the actual application data (which was hopefully queued by the mainloop by then). This means we force a couple of milliseconds of silence on application startup, and we do force a higher latency, but we get more headroom for buffer underruns and more consistent timing at startup.

During the mainloop:

  • We return the next buffer to be queued as device buffer, so it can be modified by the application.
  • When the application finishes writing the audio data, we queue that buffer and remember the following buffer as the one to be queued next.
  • SDL will then wait for more buffers to become available to continue the mainloop; this is done by looking at our semaphore.
  • Once the semaphore count is greater than 0 (= buffer available), it will decrement the count to reserve a buffer.

When a buffer finishes playback:

  • We simply increment the semaphore to mark another buffer as available (this leads to another mainloop iteration and the SDL callback being called).

Some remaining issues which should be looked at during review:

  • The driver was only tested in XQEMU, and not for long.
  • The buffer is still very short. So underruns could happen quickly. I did not test if the buffer size can be increased without breaking the driver (existing code uses 1 page per buffer and 0x400 samples for an individual buffer size of 0x1000, so page boundaries aren't an issue).
  • I was surprised that I had to queue the first audio chunk before XAudioPlay works; this is why the semaphore is initialized with zero (and some buffers are queued initially), unlike for other SDL audio drivers. I wonder if AC97 pauses when running out of buffers, and if there might be a risk of this happening (with underruns for example). I'll probably test this and create an issue on the nxdk repository so we can investigate this behavior with XAudio.
  • If two or more buffers run out too quickly around the same time, I wonder if we might only get a single interrupt. If this was the case, we'd quickly get broken audio or a full hang as buffers vanish. I'll probably create an issue on the nxdk repository so we can investigate this behavior with XAudio.
  • I did not test this on a debug kernel (I worry that this code might assert in the kernel, like previous AC97 code).
  • Buffer-memory is still write-combined, so reads from SDL callback could be bad for performance. I think this is fine as most people will write sequentially and I doubt reading is that bad on such a small buffer. We could remove this flag to have more guarantee-able performance but with lower maximum performance.

I'll create issues after merge, if we don't deal with them during review.

@JayFoxRox JayFoxRox force-pushed the audio-rewrite branch 5 times, most recently from c449ae6 to a043355 Compare April 27, 2020 16:11
XBOXAUDIO_PlayDevice(_THIS)
{
/* Send samples to XAudio */
XAudioProvideSamples(_this->hidden->buffers[_this->hidden->next_buffer], _this->spec.size, FALSE);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is write-combined memory, so this requires __asm__ __volatile__ ("sfence"); to flush the buffer probably.

I know that @thrimbor looked into this for the MCPX NIC and he found something about the device peeking for memory changes (is write-combined memory excluded from this maybe?). Would be interesting to know if he found similar comments about the MCPX ACI.

The NV2A actually requires a more complicated set of steps (click here to see code in pbkit)

Same issue exists in XBOXAUDIO_OpenDevice after making the memory silent.

Even if I removed the PAGE_WRITECOMBINE I'd not be sure how to best flush the cache so the ACI can see the new data. PAGE_NOCACHE would probably ruin the performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this code now by doing this in the SDL callback before filling with actual data:

  // Try to inject cache errors by making the signal in RAM constant
  memset(stream, rand() & 0xFF, len);
  __asm__ __volatile__ ("sfence");
#if defined(NXDK)
  __asm__ __volatile__ ("wbinvd");
#endif
  __asm__ __volatile__ ("sfence");

However, I never saw a flat signal in my output.

To doublecheck I also added this just before XAudioProvideSamples:

// Try to inject cache errors by making signal in CPU cache random
uint8_t* data = _this->hidden->buffers[_this->hidden->next_buffer];
for(int i = 0; i < _this->spec.size; i++) {
  data[i] = rand() & 0xFF;
}

However, I only saw random values and nothing of the signal (a clean sine wave) survived.

I have confirmed this for PAGE_READWRITE | PAGE_WRITECOMBINE and PAGE_READWRITE

So buffer updates without sfence or wbinvd still work absolutely fine.

Do I need to take any action?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stuff I read about was cache snooping, done by the PCI host controller to check whether there's data in the CPU cache that's more recent than RAM when a device wants to do DMA reads (afaik it can also mark CPU cache data as invalid when the device does a DMA write).
I know it can behave differently for AGP (it is or can be disabled there for performance reasons), and I think it also doesn't work for write combining. I didn't find much good documentation about this, but I was able to find this MSDN entry which indicates that a memory barrier is enough to ensure proper operation.
I can't tell whether an sfence is enough or whether we'd require mfence, but I think a fence instruction should be added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/25985698/is-it-necessary-to-flush-write-combine-memory-explicitly-by-programmer cites some intel document (not sure if applicable), that sfence isn't even required when uncached memory is written (such as MMIO).

So unless we are updating an already submitted buffer, it shouldn't be necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand you correctly - do you mean that we don't need the explicit fence because the MMIO write in xaudio to update the descriptor will flush the WC buffer anyway? At least that's how I understand the excerpt from the Intel manuals, and I'd be fine with not having a fence here then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean that we don't need the explicit fence because the MMIO write in xaudio to update the descriptor will flush the WC buffer anyway?

Exactly. This is what I meant.

@JayFoxRox
Copy link
Member Author

I have documented some of the issues in https://discord.com/channels/428359196719972353/428360618102226946/711290999091101766 on XboxDev Discord::

Broken wave
the highlighted part is broken. the high-part [32 bytes] is the end of an AC97 XAudioProvideSamples call [offset by 32 bytes to the right]. the flatline near 7.310 is the line between 2 separate SDL callbacks / buffer fills
I also tried increasing the number of buffers and the problem persisted

There are still some issues with this, but it's "better than master".
While the audio is still bad, it no longer appears to crash.

I feel like we should merge this after a quick review, and then someone will have to debug it.
I currently lack the tools to debug it properly and I have also stared at this problem for too long.

A good way to check if this might be a problem with math functions, would be to dump the generated audio to disk or network (I haven't done either of that yet).
A full log of events (interrupts, SDL callbacks, AC97 submissions) would also be helpful.

Copy link
Member

@thrimbor thrimbor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some review comments.
Even though I'm generally unhappy about the audio issues we currently have, this all may very well be an xaudio problem, and I don't think it makes sense to delay merging this much more, especially since it fixes hangs (in fact it should've gotten merged much sooner).

I intend to merge as soon as the feedback is addressed.

src/audio/xbox/SDL_xboxaudio.c Show resolved Hide resolved
src/audio/xbox/SDL_xboxaudio.c Show resolved Hide resolved
XBOXAUDIO_PlayDevice(_THIS)
{
/* Send samples to XAudio */
XAudioProvideSamples(_this->hidden->buffers[_this->hidden->next_buffer], _this->spec.size, FALSE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stuff I read about was cache snooping, done by the PCI host controller to check whether there's data in the CPU cache that's more recent than RAM when a device wants to do DMA reads (afaik it can also mark CPU cache data as invalid when the device does a DMA write).
I know it can behave differently for AGP (it is or can be disabled there for performance reasons), and I think it also doesn't work for write combining. I didn't find much good documentation about this, but I was able to find this MSDN entry which indicates that a memory barrier is enough to ensure proper operation.
I can't tell whether an sfence is enough or whether we'd require mfence, but I think a fence instruction should be added.

@JayFoxRox
Copy link
Member Author

JayFoxRox commented Jun 9, 2020

@thrimbor I'd like to delay this a bit.

I'd like to wait for XboxDev/nxdk#364. That should resolve the issues with SDL audio.
But this will need testing, and might require changes to this SDL audio PR.

With the current XAudio design, it would have been impossible to implement SDL audio correctly.
I've explained some of the issues in the PR (originally on XboxDev Discord in https://discord.com/channels/428359196719972353/428360618102226946/719720606756634675, too).

While the current code will work with my planned XAudio changes, we might still have to increase the buffer size to support some commonly used formats, due to slow CPU conversion and stupid logic in SDL (it doesn't convert while waiting for buffer playback to finish, so we might need a longer buffer to give the CPU more time).

@thrimbor
Copy link
Member

Since the XAudio changes have landed quite a while ago, I gave this another round of testing, and I'm going ahead with merging.

@thrimbor thrimbor merged commit d5d814e into XboxDev:nxdk Jul 19, 2020
@JayFoxRox JayFoxRox deleted the audio-rewrite branch July 27, 2020 22:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants