-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Xbox audio to use the default SDL audio thread #27
Conversation
c449ae6
to
a043355
Compare
a043355
to
a6e8fc9
Compare
XBOXAUDIO_PlayDevice(_THIS) | ||
{ | ||
/* Send samples to XAudio */ | ||
XAudioProvideSamples(_this->hidden->buffers[_this->hidden->next_buffer], _this->spec.size, FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is write-combined memory, so this requires __asm__ __volatile__ ("sfence");
to flush the buffer probably.
I know that @thrimbor looked into this for the MCPX NIC and he found something about the device peeking for memory changes (is write-combined memory excluded from this maybe?). Would be interesting to know if he found similar comments about the MCPX ACI.
The NV2A actually requires a more complicated set of steps (click here to see code in pbkit)
Same issue exists in XBOXAUDIO_OpenDevice
after making the memory silent.
Even if I removed the PAGE_WRITECOMBINE
I'd not be sure how to best flush the cache so the ACI can see the new data. PAGE_NOCACHE
would probably ruin the performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this code now by doing this in the SDL callback before filling with actual data:
// Try to inject cache errors by making the signal in RAM constant
memset(stream, rand() & 0xFF, len);
__asm__ __volatile__ ("sfence");
#if defined(NXDK)
__asm__ __volatile__ ("wbinvd");
#endif
__asm__ __volatile__ ("sfence");
However, I never saw a flat signal in my output.
To doublecheck I also added this just before XAudioProvideSamples
:
// Try to inject cache errors by making signal in CPU cache random
uint8_t* data = _this->hidden->buffers[_this->hidden->next_buffer];
for(int i = 0; i < _this->spec.size; i++) {
data[i] = rand() & 0xFF;
}
However, I only saw random values and nothing of the signal (a clean sine wave) survived.
I have confirmed this for PAGE_READWRITE | PAGE_WRITECOMBINE
and PAGE_READWRITE
So buffer updates without sfence
or wbinvd
still work absolutely fine.
Do I need to take any action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stuff I read about was cache snooping, done by the PCI host controller to check whether there's data in the CPU cache that's more recent than RAM when a device wants to do DMA reads (afaik it can also mark CPU cache data as invalid when the device does a DMA write).
I know it can behave differently for AGP (it is or can be disabled there for performance reasons), and I think it also doesn't work for write combining. I didn't find much good documentation about this, but I was able to find this MSDN entry which indicates that a memory barrier is enough to ensure proper operation.
I can't tell whether an sfence
is enough or whether we'd require mfence
, but I think a fence instruction should be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/questions/25985698/is-it-necessary-to-flush-write-combine-memory-explicitly-by-programmer cites some intel document (not sure if applicable), that sfence
isn't even required when uncached memory is written (such as MMIO).
So unless we are updating an already submitted buffer, it shouldn't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I understand you correctly - do you mean that we don't need the explicit fence because the MMIO write in xaudio to update the descriptor will flush the WC buffer anyway? At least that's how I understand the excerpt from the Intel manuals, and I'd be fine with not having a fence here then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean that we don't need the explicit fence because the MMIO write in xaudio to update the descriptor will flush the WC buffer anyway?
Exactly. This is what I meant.
I have documented some of the issues in https://discord.com/channels/428359196719972353/428360618102226946/711290999091101766 on XboxDev Discord::
There are still some issues with this, but it's "better than master". I feel like we should merge this after a quick review, and then someone will have to debug it. A good way to check if this might be a problem with math functions, would be to dump the generated audio to disk or network (I haven't done either of that yet). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some review comments.
Even though I'm generally unhappy about the audio issues we currently have, this all may very well be an xaudio problem, and I don't think it makes sense to delay merging this much more, especially since it fixes hangs (in fact it should've gotten merged much sooner).
I intend to merge as soon as the feedback is addressed.
XBOXAUDIO_PlayDevice(_THIS) | ||
{ | ||
/* Send samples to XAudio */ | ||
XAudioProvideSamples(_this->hidden->buffers[_this->hidden->next_buffer], _this->spec.size, FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stuff I read about was cache snooping, done by the PCI host controller to check whether there's data in the CPU cache that's more recent than RAM when a device wants to do DMA reads (afaik it can also mark CPU cache data as invalid when the device does a DMA write).
I know it can behave differently for AGP (it is or can be disabled there for performance reasons), and I think it also doesn't work for write combining. I didn't find much good documentation about this, but I was able to find this MSDN entry which indicates that a memory barrier is enough to ensure proper operation.
I can't tell whether an sfence
is enough or whether we'd require mfence
, but I think a fence instruction should be added.
@thrimbor I'd like to delay this a bit. I'd like to wait for XboxDev/nxdk#364. That should resolve the issues with SDL audio. With the current XAudio design, it would have been impossible to implement SDL audio correctly. While the current code will work with my planned XAudio changes, we might still have to increase the buffer size to support some commonly used formats, due to slow CPU conversion and stupid logic in SDL (it doesn't convert while waiting for buffer playback to finish, so we might need a longer buffer to give the CPU more time). |
Since the XAudio changes have landed quite a while ago, I gave this another round of testing, and I'm going ahead with merging. |
This is a rewrite of the crashy audio driver that was recently introduced.
This new code is run on a different thread, which lifts many restrictions from the SDL callback (although we potentially waste more CPU cycles on context switches now).
Initialization:
To summarize: We start playing silence at the second buffer (and following), until playback wraps around and reaches the first buffer with the actual application data (which was hopefully queued by the mainloop by then). This means we force a couple of milliseconds of silence on application startup, and we do force a higher latency, but we get more headroom for buffer underruns and more consistent timing at startup.
During the mainloop:
When a buffer finishes playback:
Some remaining issues which should be looked at during review:
XAudioPlay
works; this is why the semaphore is initialized with zero (and some buffers are queued initially), unlike for other SDL audio drivers. I wonder if AC97 pauses when running out of buffers, and if there might be a risk of this happening (with underruns for example). I'll probably test this and create an issue on the nxdk repository so we can investigate this behavior with XAudio.I'll create issues after merge, if we don't deal with them during review.