Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SDL3 Audio backend #6002

Draft
wants to merge 156 commits into
base: master
Choose a base branch
from
Draft

Implement SDL3 Audio backend #6002

wants to merge 156 commits into from

Conversation

hwsmm
Copy link
Contributor

@hwsmm hwsmm commented Sep 23, 2023

Please note that this is my first proper C# project. I got interested by #2784, and I started learning C# just because I wanted to try making it.
I ended up making it work somehow, but I just have no idea about what I need to do to get further.

Description

This PR separates AudioManager into BassAudioManager and AudioManager, and adds SDL3AudioManager and some more components.
Changing AudioDriver to SDL3 in framework.ini enables this feature. BASS is used as default without that option.

Track/Sample instances now actually have their raw audio data, so they are managed by C# rather than by BASS. Due to this, it's technically SDL audio implementation, but SDL doesn't manage samples/tracks. Most logics including mixing are now in C# code, so it should be fairly easy to change to another audio library as long as it supports queueing audio directly like SDL.

This mainly benefits Linux users as SDL supports PipeWire, PulseAudio, ALSA and JACK natively, whereas BASS only supports ALSA.

Choosing audio decoder

Either BASS or FFmpeg is usable. If you want to use FFmpeg, you need to bring proper FFmpeg binaries. osu!framework uses cut-down FFmpeg only for videos. libswresample is also needed.

You can enable FFmpeg by removing BASS init lines in SDL3AudioDecoderManager. BASS is used as default otherwise.

Used libraries

SoundTouch.NET is used to adjust track tempo. NAudio is used to apply BiQuad filters, adjust frequency of tracks and samples, and perform FFT for Waveform/CurrentAmplitude. SDL is used to push audio to actual audio server. (No native libraries added!)

They are all open-source, so it should be much easier to track down bugs.

Tested platforms

Tested on Windows 11, Linux and Android. Should be also usable in iOS and macOS, but I don't have any Apple device to test on.

Notable changes to existing components

  • VideoDecoder can now decode audio (meant to separate, but it would make too much diff)
  • Waveform is not dependent on BASS anymore, but performance in benchmarks is about 1.3x worse (70ms versus 53ms on previous implementation)

TODO

This PR requires no change in osu! itself, but may require in future as the game uses ManagedBass.Fx directly.

What's not working

  • Any audio effects other than BiQuad ones

Some quirks

  • Audio stutters a bit when GC is happening and buffer is small

Needs to be done in future

  • Change to the original SoundTouch instead of .NET port to get some optimization: my old code used this, and it works as good as .NET one, but I didn't want to add a native library in this PR.
  • Make VideoDecoder abstract, and separate audio part from it
  • BASS tests are reused to reduce diffs, may need to separate in future.

@smoogipoo
Copy link
Contributor

This is super cool, though I don't see osu! using this in the near future if ever. I'd see this implemented as an additive nuget package, and exposed through some way that isn't the framework config. For example, it could be included in HostOptions or simply as a virtual method in Game.

@peppy
Copy link
Member

peppy commented Sep 25, 2023

I'd actually like to experiment with this and see how good support is for the upcoming changes I want to make with WASAPI initialisation. So I don't want to throw this out. Moving away from bass would be a huge consideration, but I wouldn't throw it away.

I'm going to mark this as a draft as I don't see it getting reviewed or merged anytime soon, but I still useful to have around as a reference for what is involved in making this work, and potential performance / latency cross-checking in a future.

@peppy peppy marked this pull request as draft September 25, 2023 04:05
@hwsmm
Copy link
Contributor Author

hwsmm commented Sep 29, 2023

I'll maintain this until some of you have anything to do with this, mostly because I am now so used to playing the game with these patches, I can't really go back to BASS...
FWIW, on Linux with Pipewire at very small buffer, hitsound latency was almost on par with patched wine which osu! linux players mostly use. It produces some artifacts, so shouldn't be default, though.

@smoogipoo
Copy link
Contributor

Last time I tried this branch, it still had a few bugs/unimplemented stuff. But now that SDL3's API is stabilised, how do you feel about getting this in at some point? Even if it's not complete, I think having something in the working set will help with development/get more eyes on it.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 24, 2024

To be honest, I'm a bit unsure about this PR at this point.

It can bring some goodies to the framework, such as:

  • Free audio stack without introducing a new native library (FFmpeg needs to support audio decoding though)
  • All of components except for decode/output are managed, so it should be easy to debug
  • Better support for Linux audio servers, and potential support for niche architectures

However, from what I've seen for an year, most people who want this to get merged want low-latency audio, which this PR fails to deliver due to GC stutters. No improvements on Windows (only good until 10ms buffer), maybe even worse on Linux (good until 20ms) if you want 'good' output.
Gameplay is fine probably because of LowLatency GC option which we can't use for long time.

So, as a solution for this, I have been writing a C library in the meantime as I mentioned above, and it turned out to be quite nice for me so far. Using this can also remove raw audio processing code from this PR, effectively making framework consume an external package for alternative audio backends, but obviously, it's native, so potentially unsafe.

I could have tried improving an existing audio library like I said above, but I ended up making a new one, because decoding needs to happen asynchronously not to delay playback, and it needs to support tempo adjust/bq filters/resampling. I couldn't find any open-source libraries that can do all of them, so I did the 'new standard' thing... that was pretty fun, so I don't regret. It was not that hard because all I did was rewriting what I wrote into C, too.

I know that this isn't the best way, but audio stutter are way more annoying than visual stutters. You just get pop noises into your ears.

I want this PR to be good enough for what users expect, and a viable alternative to BASS for most usecases.
Note that it is not a replacement of BASS as it just re-implements my C# functions in C, so almost everything is one or two call away without converting units and things because it was pretty much made for osu!framework, but it's also standalone, so I guess it makes more sense to maintain and package it in my separate repository and periodically bump version here.

I will have to do some work (and probably a lot of work on mobile toolchains) to actually introduce it here, so I'd like to know how you think about this option.

I understand you even if you don't want this, because I also think this sounds like a step backwards in some ways. Current C# version can still live as a free alternative for the framework.

@smoogipoo
Copy link
Contributor

smoogipoo commented Oct 25, 2024

Beyond just latency, we still have ongoing reports of offset gradually falling behind (long trail starting here ppy/osu-stable-issues#306, stable issue but the same happens on lazer - see referencing issues) and stutters at the start of maps (BASS startup time). The second we can resolve by swapping the reference clock but the first is kind of hard to do anything with - un4seen doesn't work closely enough with us to get anywhere.

As far as I've been told, both of those issues are fixed with SDL3. I'm not against using native code, as long as bulk of the core logic is C#.

@smoogipoo
Copy link
Contributor

Am thinking you might even be able to make a C# NativeAOT project for it if people are uncomfortable with C. I have a bunch of experience setting that up on all relevant platforms including mobile if needed.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 28, 2024

Sorry for not responding quickly.

That's actually not surprising since I started writing this backend because I was consistently hitting early on the game no matter what offset I use.

Honestly, I don't really know what to do, so I couldn't reply to your comment on time. I'm fine with maintaining either C or C# version, but if we end up with C# one, I think we'd get another bug report about audio stutters.
The native library unfortunately contains most of core logic to avoid GC stutters as much as possible. It is just an audio library. You create a channel, add it to a mixer, and then it plays.

I guess NativeAOT might work if people dislike using C, but we are not compiling the entire game with it, right? If we are compiling the audio framework in a shared object with NativeAOT, should we make bindings for it, or is there a way to dynamically load a natively AOT-compiled library in C#?

@smoogipoo
Copy link
Contributor

I'm fine with maintaining either C or C# version, but if we end up with C# one, I think we'd get another bug report about audio stutters

The hopeful idea is for the NativeAOT lib to do things in such a way that every allocation is accounted for, while also making it touchable by core developers (who are most experienced in C#).

I guess NativeAOT might work if people dislike using C, but we are not compiling the entire game with it, right?

Nah. I don't think we ever will - the JIT is actually quite useful for us optimisation-wise. But that won't help in this case because the idea is to isolate the osu! GC from the NativeAOT GC.

If we are compiling the audio framework in a shared object with NativeAOT, should we make bindings for it, or is there a way to dynamically load a natively AOT-compiled library in C#?

You need to expose functions using [UnmanagedCallersOnly] (EntryPoint is required) (example). Then you use it just as a normal native lib (example).

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 28, 2024

Yeah, I agree on the point, too.

But I am worrying about how we should pass objects between .NET and NativeAOT. Do we need to create ObjectHandles for everything, as .NET objects are not blittable?

@smoogipoo
Copy link
Contributor

Yeah. You'd basically interface with it as if it's a normal C lib - there's no passing of objects or hackily dereferencing into a matching managed signature (except for blittable/unmanaged types ofc).

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 28, 2024

Well, it sounds like rewriting my C library into C# again if I am understanding it correctly.

I think it shouldn't take so long since I already have the .NET part from my native project (unless I run into weird issues). I'll start writing it sooner or later if you are fine with it.

@smoogipoo
Copy link
Contributor

Hold off on that one for a bit. Do you have a sample of the C lib to look at to see what needs to be done?

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 28, 2024

https://gist.github.com/hwsmm/c2bb0e55694e14c83372b8bdcb73830b

Here are (hand-written) function bindings. Once all of them are implemented, it should work well.
I hope function names are clear enough to tell what they do.

It doesn't have fancy things like SDLBool because I am the only consumer of this library...

@smoogipoo
Copy link
Contributor

I was hoping to see the implementation because people may start to get uncomfortable if the library is very large. In general I think people would feel comfortable as long as most implementation is still managed and the native part is only really fundamental processing.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 28, 2024

Oh, if that's the reason, I can send you the compressed source tree. I don't want it to be public if it is not eventually going to be released, also it needs a lot of polishing so...

Can I send you a mail to the address in your GitHub profile?

However, I am not sure about the border of fundamental processing, though. The native part should know all the information that it needs to play audio without relying on C# part to avoid stutters, which includes volume, balance, tempo, frequency and more.

At best, 'Most implementation' can only be about exposing information (duration, channel counts, current time and so on) to the game, telling native channels to play/stop, and creating/destroying native channels when needed.

@smoogipoo
Copy link
Contributor

Can I send you a mail to the address in your GitHub profile?

Yep, that's fine. I'll have a look at it with peppy and see if it's something we're comfortable with the scope/having it.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 29, 2024

I've just sent a mail!

Forgot to add in the mail. I am planning to implement decoder in .NET side, because it adds a bit of complexity and may cause link problems with FFmpeg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants