-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is this API appropriate, especially for real time use #3
Comments
The link to the alternate design: https://github.com/egonelbre/exp/tree/master/audio 101 of real-time audio http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing |
@egonelbre would you mind squashing your commits for the proposal or maybe send a PR. GitHub really makes it hard to comment on different part of the code coming from different commits :( |
Typically when using audio my needs have been:
|
@mattetti sure no problem.
@kisielk sure, if you have such sample-based synth you probably need to track what notes are playing, etc. anyways so you would have a |
@mattetti Here you go: egonelbre/exp@81ba19e |
Yes but the "synth" is not going to be limited to one sample, usually you have some number of channels, say 8-16, and each one can choose any part of any sample to play at any time. In my opinion processing audio in float64 is pretty niche, relegated to some high precision or quality filters which aren't commonly used. Even in that case, the data can be converted to float64 for processing just within that filter block, there's little reason to store it in anything but float32 otherwise. Even still most DSP is performed using float32 even on powerful architectures like x86, reason being that you can do twice as much with SIMD instructions in that case. Of course I'm totally fine with having float64 as an option for a buffer type when appropriate, but I believe that float32 should be on par. I feel like it would certainly be the primary format for any real-time applications. Even for batch processing you are likely to see performance gains from using it. |
@kisielk Yes, also, for my own needs float32 would be completely sufficient. Forums seemed to agree that in most cases float64 isn't a signifcant improvement. However, if one of the intended targets will be writing audio plugins; then many plugin API-s include float64 version (e.g. VST3) and DAW-s have an option to switch between float32 and float64. I agree that, if only one should be chosen then |
I agree that |
Again, I don't think it's a binary choice, I just think that both should have equal support within the API. And yes, if I was using Go for realtime processing of audio I would definitely want a 32-bit version of the math package. I don't think the math package needs to dictate any limitations on any potential audio API. |
@kisielk sounds fair, just to be clear, would you be interested in using Go for realtime processing or at least giving it a try? You obviously do that for a living using C++ so your expertise would be invaluable. |
How much math functions are needed in practice? Initially the package could be a wrapper around I suspect some of the first bottleneck and candidate for "asm optimized" code will be []int16->[]float32 conversion, buffer multiplication and/or addition two buffers together. |
@mattetti that is something I'm definitely interested in. I'm not exactly a DSP expert, but I work enough with it day to day to be fairly familiar with the domain. @egonelbre Gain is also a big one that benefits from optimization. (edit: maybe that's what you meant by buffer multiplication, or did you mean convolution?) |
@kisielk yeah, I meant gain :), my brains language unit seems to be severely malfunctioning today. |
math package (trigonometric, logarithmic, etc) with float32 and SIMD optimization for any data type are two different things. In many cases just mult/add/sub/div are needed and for those package math is not needed. I think that math32 and SIMD are best kept separate from this proposal. If we are thinking of performance then conversions of buffers without needing to allocate can be important. For example have one input buffer and one output buffer for the conversion. Instead of allocating a new output buffer each time. |
@taruti +:100: |
Speaking of conversion between buffers, I think it's important the API has a way to facilitate conversion between buffers of different data types and sizes without allocation (eg: 2 channels to 1, etc). The actual conversion method would be determined by the application but at least the API should be able to help facilitate this without too much additional complexity. |
Alright, here is my suggestion. I'll add you guys to the organization and we can figure out an API for real time processing and from there see how it works for offline. Ideally I would love to end with:
@rakyll and I also discussed adding wrappers to things like CoreAudio on Mac so we could have an end to end experience without having to rely on things like portaudio. This is outside of the scope of what I have in mind but I figured I should mentioned it. I like designing APIs against real usage, so maybe a first good step is to define an example we would like to build and from there define the components we need. Thoughts? |
That sounds like a good idea to me. However I would propose we limit the scope of the core audio package to the first two points (and perhaps a couple of very general utilities from point 3). I feel like the rest would be better suited for other packages. My main reasoning behind this is that I feel like the first two items can be achieved (relatively) objectively and there can be one canonical implementation. As you go down the list it becomes increasingly application-dependent. |
I think the audio API should be in its own package and each of those things in separate packages. For instance I have the wav and aiff packages isolated. That's another reason why having a GitHub organization is nice. |
Just noticed that when looking at the org page. Looks good to me 👍 |
There's the original proposal. @egonelbre has an alternative proposal. Here are a couple more (conflicting) API ideas for a Buffer type. I'm not saying that either of them are any good, but there might be a useful core in there somewhere. See also another API design in the github.com/azul3d/engine/audio package. Reader/Writer-ish:
Have Buffer be a concrete type, not an interface type:
|
In addition, here is another comment from @nigeltao about the math library:
I really like this idea which can also apply to log. It might come at an extra memory cost but I am personally OK with that. Let try to summarize the pros and cons of those different approaches and let's discuss what we value and the direction we want to take. I am now convinced that my initial proposal, while fitting my needs, doesn't work well in other scenarios and shouldn't be left as is. |
A broader point, re the proposal to add packages to the Go standard library or under golang.org/x, is that I think it is too early to say what the 'right' API should be just by looking at an interface definition. As rsc said on golang/go#18497 (comment): "The right way to start is to create a package somewhere else (github.com/go-audio is great) and get people to use it. Once you have experience with the API being good, then it might make sense to promote to a subrepo or eventually the standard library (the same basic path context followed)." Emphasis added. The right way might actually involve letting a hundred API flowers bloom, and trying a few different APIs before making a push for any particular flower. I'd certainly like to see more experience with how audio codecs fit into any API proposal: how does the Buffer type (whatever it is) interact with sources (which can block on I/O, e.g. playing an mp3 stream over the network) and sinks (which you don't want to glitch)? WAV and AIFF are a good start, but handling some sort of compressed audio would be even better. A full-blown mp3 decoder is a lot of work, but as far as kicking API tyres, it might suffice to write a decoder for a toy audio codec where "c3d1e3c1e2c2e4" decoded to "play a C sine wave for 3 seconds, D for 1 second, E for 3 seconds, etc", i.e. to play a really ugly version of "doe a deer". |
Back on API design brainstorming and codecs, there might be some more inspiration in the golang.org/x/text/encoding/... and golang.org/x/text/transform packages, which let you e.g. convert between character encodings like Shift JIS, Windows 1252 and UTF-8. Text encodings are far simpler than audio codecs, though, so it might not end up being relevant. |
Some more API inspiration, from C++: https://www.juce.com/doc/classAudioBuffer JUCE is one of the most-used audio processing libraries out there. |
Obviously the API isn't very go-like since it's C++ (and has a fair amount of pre-C++11 legacy, though is gradually being modernized) but it's worth taking a look at how they put things together. |
JUCE uses overloading quite heavily and as mentioned isn't very go-like (it's also a framework more than a suite of library, but it is well written and very popular). My hope is that we can come up with a more modern and accessible API instead of "port", I would really want audio in Go to be much easier for new developers. On a side note, I did port over some part of JUCE such as https://www.juce.com/doc/classValueTree for better interop with audio plugins. |
I'm not suggesting porting it, but I think the concepts in the library are pretty well thought out and cover most of what you would want to do with audio processing. It's worth getting familiar with. I don't think the use of overloading really matters, it's pretty easy to do that in other ways with Go. |
@nigeltao I agree with rsc and to be honest my goal was more to get momentum than to get the proposal accepted. I'm very happy to have found a group of motivated people who are interested in tackling the same issue. I'll open a couple issues to discuss code styling and "core values" of this project. |
@mattetti I just converted my experiment to use split channels, the full difference can be seen here egonelbre/exp@8c77c79?diff=split#diff-b1e66adfee4cfc554526b30559e7e612 |
I'm back at looking at what we can do to design a generic buffer API. @egonelbre I don't think isolating channels is the way to go, we can more than likely implement an API similar to your I did hit an issue today when I needed to feed a PCM data chunk in int32 or float32 and my current API was only providing int. So I'm going to explore the image and text packages to see if there is a good flexible solution there. I looked at azul3d which is quite well done but I'm not a fan of their buffer/slice implementation: https://github.com/azul3d/engine/blob/master/audio/slice.go |
Taking notes: Text transformer & chain interfaces: https://godoc.org/golang.org/x/text/transform#Transformer (similar to what an audio transformer interface could be but passing a buffer) In regards to the Draw package, it starts from an // Image is a finite rectangular grid of color.Color values taken from a color
// model.
type Image interface {
// ColorModel returns the Image's color model.
ColorModel() color.Model
// Bounds returns the domain for which At can return non-zero color.
// The bounds do not necessarily contain the point (0, 0).
Bounds() Rectangle
// At returns the color of the pixel at (x, y).
// At(Bounds().Min.X, Bounds().Min.Y) returns the upper-left pixel of the grid.
// At(Bounds().Max.X-1, Bounds().Max.Y-1) returns the lower-right one.
At(x, y int) color.Color
} The interface is implemented by many concrete types such as: // NRGBA is an in-memory image whose At method returns color.NRGBA values.
type NRGBA struct {
// Pix holds the image's pixels, in R, G, B, A order. The pixel at
// (x, y) starts at Pix[(y-Rect.Min.Y)*Stride + (x-Rect.Min.X)*4].
Pix []uint8
// Stride is the Pix stride (in bytes) between vertically adjacent pixels.
Stride int
// Rect is the image's bounds.
Rect Rectangle
}
func (p *NRGBA) ColorModel() color.Model { return color.NRGBAModel }
func (p *NRGBA) Bounds() Rectangle { return p.Rect }
func (p *NRGBA) At(x, y int) color.Color {
return p.NRGBAAt(x, y)
} or // Paletted is an in-memory image of uint8 indices into a given palette.
type Paletted struct {
// Pix holds the image's pixels, as palette indices. The pixel at
// (x, y) starts at Pix[(y-Rect.Min.Y)*Stride + (x-Rect.Min.X)*1].
Pix []uint8
// Stride is the Pix stride (in bytes) between vertically adjacent pixels.
Stride int
// Rect is the image's bounds.
Rect Rectangle
// Palette is the image's palette.
Palette color.Palette
}
func (p *Paletted) ColorModel() color.Model { return p.Palette }
func (p *Paletted) Bounds() Rectangle { return p.Rect }
func (p *Paletted) At(x, y int) color.Color {
if len(p.Palette) == 0 {
return nil
}
if !(Point{x, y}.In(p.Rect)) {
return p.Palette[0]
}
i := p.PixOffset(x, y)
return p.Palette[p.Pix[i]]
} A gif image is implemented as such: type GIF struct {
Image []*image.Paletted
//...
} When drawing using the generic But there is always a slow fallback. |
@mattetti that's a good find. Maybe the audio interface could have functions that return / set values for a particular channel, sample pair as either float64 or int. Then the underlying data could be in a more optimized form and functions that need the highest performance can use a type switch and operate on the data directly. |
I'm thinking something like:
|
There seem to be two uses:
With callbacks and call per sample, the overhead is an issue; if it could be inlined and better optimized, it would make some things much nicer. For the 1. case, you don't always have random access or it is expensive for compressed streams. So building a buffer with random sample based access doesn't make sense... I don't think random access is necessary there, you want to read a chunk or write a chunk of samples. Interleaved when you want to immediately output or basic processing. Deinterleaved for more complicated things. For 2. you want deinterleaved buffers to make processing simpler. To ensure that processing nodes can communicate you want at most three different buffer formats that work with it. The case against not doing swizzling when reading/writing is performance... but when you want performance, then in those cases you probably need to use the native format anyways, which might be uint16... but then there might also be issues with sample rate or mono-stereo conversions. |
I was apparently wrong:
But the API clearly exposes a way to get the channel number: // Stereo
var channels = 2;
// Create an empty two second stereo buffer at the
// sample rate of the AudioContext
var frameCount = audioCtx.sampleRate * 2.0;
var myArrayBuffer = audioCtx.createBuffer(channels, frameCount, audioCtx.sampleRate);
button.onclick = function() {
// Fill the buffer with white noise;
// just random values between -1.0 and 1.0
for (var channel = 0; channel < channels; channel++) {
// This gives us the actual array that contains the data
var nowBuffering = myArrayBuffer.getChannelData(channel);
for (var i = 0; i < frameCount; i++) {
// Math.random() is in [0; 1.0]
// audio needs to be in [-1.0; 1.0]
nowBuffering[i] = Math.random() * 2 - 1;
}
}
// Get an AudioBufferSourceNode.
// This is the AudioNode to use when we want to play an AudioBuffer
var source = audioCtx.createBufferSource();
// set the buffer in the AudioBufferSourceNode
source.buffer = myArrayBuffer;
// connect the AudioBufferSourceNode to the
// destination so we can hear the sound
source.connect(audioCtx.destination);
// start the source playing
source.start();
} |
Quick update: this interleaved vs not interleaved issue got me stuck. I instead opted to do a lot of work on my wav and aiff decoders, make sure the buffered approach worked and was documented/had examples. I spent a decent amount of time improving the 24bit audio support for the codecs (my implementation was buggy but hard to verify, tests were added). At this point, I still think Nigel/image pkg approach is the most interesting but I don't have the bandwidth to build a full implementation. Egonelbre's implementation shows some of the challenges we are facing depending on the design decision we take. I'll focus on my current edge cases and real world implementation to see how a generic API would best benefit my own usage. On a different note, @brettbuddin wrote a very interesting synth in Go and I think he would be a good addition to this discussion: https://github.com/brettbuddin/eolian |
Took me a bit to get caught up on the state of the discussion. One of the early decisions I made with Eolian was to focus on a single channel of audio. This is mostly because I didn't want to deal with interleaving in processing and I didn't have a decent implementation for keeping channels separate at the time. I've wanted to implement 1-to-2 (and back) conversion modules for some time now, but have been stuck in a similar mode of trying to decide how not to disrupt the current mono structure too drastically. |
While implementing different things I realized there is one important case where int16 is preferrable -- ARM -- or mobile device in general. Looking at pure op stats there's around 2x performance difference between using float32 and int16. Of course, I'm not sure how directly these measurements translate into audio code, but it is something to be considered. |
Here is the API superpowered uses to deal with offline processing: Here is how they deal with real time audio: https://github.com/superpoweredSDK/Low-Latency-Android-Audio-iOS-Audio-Engine/blob/master/Examples_iOS/SuperpoweredFrequencyDomain/SuperpoweredFrequencyDomain/ViewController.mm#L19 ( with a hint at interleaved vs not) |
Yup, interleaved is faster to process... I was concerned about whether using buffer[i+1] etc. would introduce a bounds check in the tight loop. But it seems it is not: https://play.golang.org/p/EkNPEjU3bS
It seems that the optimizer can deal with it nicely. Disabling bounds checks had no effect on the results. However using "variable" number of channels has a quite impact. Still, it is more convenient to write code for deinterleaved :D. Maybe there is a nice way of automatically generate code based on one single channel example for different types and channel counts? Superpowered has chosen to support only stereo which makes things much easier and as far as I understand it only supports float32. |
Edit: Nevermind. I misread the example. Disregard this question. |
(Reposting from Pixel gitter). Just a quick point here from me (got some work right now, will come with more ideas later). Looking at go-audio, I don't really like the core Buffer interface at all. It's got way too many methods, none of which enables reading data from it without converting it to one of those 3 standard formats. Which makes it kind of pointless to use any other format. Which is very bad, because a playback library might want to use a different format. Which would require extensive conversions for every piece of audio data, maybe even 2 conversions for each piece, and that's a lot of mess. |
Hi guys, as you could've noticed on Reddit, a month ago we've started working on an audio package in Pixel and we want to make it a separate library. Throughout the month, we learned numerous things. One of the most important things we learned is: buffers in the high-level API are not a very good idea for real-time. Why? Management, swapping them, queueing them, unqueueing them, and so on, cost significant CPU resources. OpenAL uses buffers and we weren't able to get less than 1/20s latency, which is quite high. Also, buffers don't work very well. They don't compose, it's hard to build nice abstractions around them. They're simply data. In Pixel audio, we chose a different approach. We built the whole library (not that it's finished) around this abstraction: type Streamer interface {
Stream(samples [][2]float64) (n int, ok bool)
Err() error
}
The thing is, this abstraction is extremely flexible and composes very well, just like I'd like to suggest replacing current go-audio with the audio package from Pixel. I know this is a big suggestion, but I think it would be worth it for go-audio. In case you don't want to do that, I'd at least suggest you change your abstractions away from buffers and more towards this streaming approach. Thanks! If you have any questions, please ask! |
I wasn't aware of this new audio package. I'll definitely check it out. One thing that surprised me is the fact that your streamer interface seems to be locked to stereo. Is that correct? |
Yes that is correct. This no problem for mono, mono can always be easily turned into stereo. This is only a problem for more channels than stereo offers. I'm open to solutions for this. So far, it's like "stereo is enough", but that might not be true. |
The idea of replacing go-audio was a bit too rushed, upon second though, it's probably not a good idea. However, I still suggest that go-audio changes its core interfaces away from buffers and towards streaming. |
We definitely need to have support for surround/multichannel audio formats. Let me take a look next week and write down my thoughts and see if we can go from there. |
@faiface I'm unclear about the comment that "buffers in the high-level API are not a very good idea for real-time". I mean your design still uses buffers i.e. the I do agree that the end-user of things like player, effects, decoding, streaming, encoding, shouldn't have to worry about buffers. But internally handling of buffers is unfortunate, but necessary. E.g. when you get 16 track audio sequencer, most likely you will need some way to generate/mix them separately, because one might cause ducking on an another track. Note that the go-audio/audio is not the only attempt at audio lib - i.e. https://github.com/loov/audio. I will try to go over the package in more details later, but preliminary comments based on seeing the API:
I do agree that as a simple game audio package it will be sufficient... And not handling the diversity can be a good trade-off. |
@egonelbre Of course, internally there usually are some buffers, however, this API minimizes their use. This is in contrast to OpenAL, which requires you to create and fill a buffer if you want to play anything. This results in additional copying between buffers and so on. Maybe that can be implemented efficiently, however, OpenAL does not implement it efficiently enough. Note, that when calling Regarding channels, yeah, that's a trade-off. Regarding sample rates. Yes, there can be multiple sample rates. The reason we adopted the approach of "unified sample rate" is that it simplifies signal processing code in major ways. For example, if the However, unified sample rate is not a problem, IMHO. Audio sources, such as files, can always be resampled on-the-fly, or the sample rate can be adjusted according to them. The unified sample rate is only important for the intermediate audio processing stage. The same holds for different sample types. Any sample type can be converted to |
Yup, I agree that within a single pipeline you will most likely use a single sample rate. However, a server that does audio-processing in multiple threads might need different sample-rates. But, I'm also not completely clear how big of a problem this is in real-world. With regards to I understand very well that handling all these cases complicate the implementation. (To the extent that for a game, I will probably just use stereo float32, myself.) With regards to performance, I would like to specialize on all the different parameters. Effectively, to have I do have some thoughts. 1. code-gen; Nile stream processing language for defining effects, 2. optimization passes similar to go compiler SSA and generate SIMD code, 3. single interfaced package that is backed by multiple implementations etc.... But, generally, every decision has trade-offs -- whether you care about the "trade-off" is dependent on the domain and your use-cases -- and it's completely fine for some domain not to care about some of these trade-offs. |
I believe that you and I agree that having to implement each filter/effect/compositor for each sampling format (mono, stereo, int8, int16, float32, ...) is awful. So yeah, one way probably is code generation, although I'm not sure how feasible this is. The question I think is in place is: is it worth to support all the formats within the audio processing stage? I decided to answer this question with: no. Of course, it's necessary to support all the formats for decoding and encoding audio files. But I think it ends there. Let me show you. Here's the implementation of the "gain effect" (a non-sophisticated volume slider): https://github.com/faiface/beep/blob/master/effects.go#L8. Now, the important thing is: this is all I had to write. And it works with everything. I can use to to adjust the volume of music, sound effects, individual sounds, I can change the volume in real-time. If I was to support all the different formats in the audio processing stage, I can't really see how I would achieve this level of convenience. And convenience like this makes it possible to implement all of the complicated effects, such as 3D sound and Doppler effect things in very few lines of code. Beep is currently less than 1K LOC (not counting https://github.com/hajimehoshi/oto, which implements the actual playback backend) and already supports loading WAV files, mixing, sequencing, controlling playback (pausing, tracking time), playing only parts of an audio file or any streamer, and makes it easy to create your own streamers and effects. I'm sorry if I sounded like I was wanting to destroy go-audio before :). I eventually came to the conclusion that it's best to keep Beep and go-audio separate. I just want to point out one way of doing audio and show its benefits. And you guys can take an inspiration, or don't. No problem there. EDIT: And I don't see why a server with multiple threads could possibly need to use different sample rates anywhere. |
Go audio is very much a work in progress and far from it's final form. Your
feedback and library are very useful and very much appreciated.
Creating a solid API for very different use cases is really hard. For
instance I am dealing with VST plugins and hosts and there are cases where
not using float32 isn't an option. I also want to be able to eventually
optimize our compiled code as Egon was talking about and run go DSP stuff
on optimized architectures.
What we agreed to do is to spend time experimenting and figuring out the
limits of the design.
…On Sat, Jul 15, 2017, 02:06 Michal Štrba ***@***.***> wrote:
I believe that you and I agree that having to implement each
filter/effect/compositor for each sampling format (mono, stereo, int8,
int16, float32, ...) is awful. So yeah, one way probably is code
generation, although I'm not sure how feasible this is.
The question I think is in place is: is it worth to support all the
formats within the audio processing stage? I decided to answer this
question with: no. Of course, it's necessary to support all the formats for
decoding and encoding audio files. But I think it ends there.
Let me show you. Here's the implementation of the "gain effect" (a
non-sophisticated volume slider):
https://github.com/faiface/beep/blob/master/effects.go#L8. Now, the
important thing is: this is all I had to write. And it works with
everything. I can use to to adjust the volume of music, sound effects,
individual sounds, I can change the volume in real-time. If I was to
support all the different formats in the audio processing stage, I can't
really see how I would achieve this level of convenience. And convenience
like this makes it possible to implement all of the complicated effects,
such as 3D sound and Doppler effect things in very few lines of code.
Beep is currently less than 1K LOC and already supports loading WAV files,
mixing, sequencing, controlling playback (pausing, tracking time), playing
only parts of an audio file or any streamer, and makes it easy to create
your own streamers and effects.
I'm sorry if I sounded like I was wanting to destroy go-audio before :). I
eventually came to the conclusion that it's best to keep Beep and go-audio
separate. I just want to point out one way of doing audio and show its
benefits. And you guys can take an inspiration, or don't. No problem there.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAAcaeblbUarBTaB3RDzzwLTntOzHLwks5sOAJ2gaJpZM4La4ew>
.
|
I completely understand the reasoning for not supporting the different options I described. I think it's important to examine the potential design-space as fully as possible. Note, the Gain function you have there can produce clicking noise. i.e. try switching between gain 0 and 1 every ~100ms. Not sure where the gain value comes from, how it's modified and how big are your buffers... so it might not happen in practice. And, oh, yes, I know all the pain of writing the processor code for multiple formats/options https://github.com/loov/audio/blob/master/example/internal/effect/gain.go#L20. It avoids some of the aliasing, but still can have problems with it. Also it doesn't do control signal input. But I digress. Do note, this is also the reason we have go-audio/audio and loov/audio separate -- so we can experiment independently and come up with unique solutions. (e.g. I noticed in the wave decoder this https://github.com/faiface/beep/blob/master/wav/decode.go#L125 -- look at similar issue I posted against go-audio/wav go-audio/wav#5 (comment)) Eventually we can converge on some of the issues... or maybe we will create multiple package for some different-domains, but there is still code that could be potentially shared between all the implementations. (e.g. ASM/SIMD code on []float32, []float64 arrays for different platforms; or converting MULAW bytestream to []float64-s).
Facilitated example: Imagine a resampling service where you submit a wave-file and you can specify the end-result sample-rate, which you can later download. |
One of the reasons why I concluded that beep is not a good fit for go-audio is that I deliberately make compromises to enable simplifity and remove the pain of audio programming, but that gets in the way of universality, so we're aiming at slightly different goals. Regarding clicking in Gain, I don't think it's Gain's responsibility to fix that. You know, if you start playing PCM data in the middle of a wave (at value 0.4 for example) a click is what's supposed to happen. I think it's the user's responsibility to adjust the gain value smoothly, e.g. by lerping. And the buffer size can be as small as 1/200s (but only on Linux at the moment, but we're working on getting the latency down on other platforms too). The decoding design, that's interesting, although I think WAVE only supports uint8, int16 and float32, right? So I'm not sure it's worth it, but I'll think about it. And resampling server. If you take a look at our roadmap, one of the things which are to be done are: Buffer struct and Resample decorater. Don't be confused, Buffer struct is more like a bytes.Buffer for samples and less like an OpenAL buffer. So the sample conversion will be done something like this: buf := beep.NewBuffer(fmt2)
buf.Collect(beep.Resample(source, fmt1.SampleRate, fmt2.SampleRate)) or even directly to the file (in which case the file is never fully in memory, note that wav.Encode is not implemented yet)
|
@faiface Some WAVE samples here: https://en.wikipedia.org/wiki/WAV#WAV_file_audio_coding_formats_compared. And a list of things libsndfile supports http://www.mega-nerd.com/libsndfile/. Although, you can pretty far by just supporting PCM u8, s16, s24, s32, s64 and IEEE_FLOAT 32, 64, mulaw and alaw. |
@egonelbre With WAVE, only PCM support is planned so far, as PCM seems to be close to 100% of the existing usage. Currently supported are u8 and s16, but float32 support will be added shortly. I believe these cover an overwhelming majority of what's out there. |
@faiface audacity uses float32 wav |
Hi all, There is an issue about pre-emption in go runtime which can influence reliability of audio i/o. Also, zc/sio is a work in progress to deal with audio callback APIs like CoreAudio, AAudio, Jack which operate on a foreign usually real-time thread. |
This discussion is a follow up from this initial proposal. The 2 main arguments that were raised are:
@egonelbre @kisielk @nigeltao @taruti all brought up good points and Egon is working on a counter proposal focusing on smaller interfaces with compatibility with types commonly found in the wild (int16, float32).
As mentioned in the original proposal, I'd like to this organization of a special interest group of people interested in doing more/better audio in Go. I have to admit my focus hasn't been real time audio and I very much appreciate the provided feedback. We all know this is a challenging issue which usually results in a lot of libraries doing things in very different ways. However, I do want to believe that we, as a community and with the support of the core team, can come up with a solid API for all Go audio projects.
The text was updated successfully, but these errors were encountered: