Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support variable number of channels #9

Open
MarkKremer opened this issue Oct 9, 2023 · 1 comment
Open

Support variable number of channels #9

MarkKremer opened this issue Oct 9, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request proposal Requesting opinions

Comments

@MarkKremer
Copy link
Contributor

MarkKremer commented Oct 9, 2023

Current state

I would say that the 2 most important types in Beep are the following:

// Streamer is able to stream a finite or infinite sequence of audio samples.
type Streamer interface {
	Stream(samples [][2]float64) (n int, ok bool)
	Err() error
}
// Format is the format of a Buffer or another audio source.
type Format struct {
	SampleRate SampleRate
	NumChannels int
	Precision int
}

Streamer allows us to define operations on samples. Using the composite pattern it is possible to combine operations to create more complex operations.

Format, besides storing the format information, is used to encode/decode samples into different representations.

These types are very powerful and can be used to do a lot of things with very little. However, there are some details about them that make me wonder if something better is possible:

  • Even though the Format specifies the number of channels, within the interface of the Streamer, the number of channels is hardcoded to 2. I suspect this is done to keep the library simple and this is an important consideration. Please try to keep this in mind when reading the rest of this proposal.
  • Withing Beep, Precision and SampleRate are mostly used at endpoints: when encoding/decoding a file and when using an in-memory buffer (which is similar to a WAV file). In addition, SampleRate can be used when resampling samples.

The number of channels seems like it's an inherent property of the samples while the Format is only used at specific parts of the application. It is metadata that is exposed when decoding a file format, or it can passed as configuration to encode audio. Format is however, never directly used by Streamers and is completely separate from the composite pattern that is core to Beep.

Proposal

Move NumChannels to Samples.

The samples are stored in an interleaved format in a 1D slice. We lose the syntactic sugar of 2D slices which I solved by using methods (BOOO!). I think the benefits could very well outweigh the drawbacks but I would like to invite you to think about the developer experience for the users of Beep when, say, they want to implement a custom Streamer.

For reference, this is what the types will look like (approximately):

// Samples contains a finite sequence of audio samples for one or more channels.
type Samples struct {
	Samples []float64 // interleaved
	NumChannels int
}

// Get a single sample.
func (s Samples) Get(index, channel int) float64 {
	return s.Samples[index*s.NumChannels + channel]
}

// Set the value of a sample.
func (s *Samples) Set(index, channel int, value float64) {
	s.Samples[index*s.NumChannels + channel] = value
}

// Streamer is able to stream a finite or infinite sequence of audio samples.
type Streamer interface {
	Stream(samples Samples) (n int, ok bool)
	Err() error
}
// Format describes the stored format of an audio stream, as a file or in-memory.
type Format struct {
	SampleRate SampleRate
	Precision int
}

In this scenario, Format can be used to format individual samples still. However, it doesn't deal with framing samples of channels together.

What do we gain?

One obvious benefit is that the number of channels isn't constant anymore:

  • It is possible to read Vorbis 5.1 surround sound files without Beep choosing for you which channels to keep. I don't know what they're doing with 5.1 audio files in Beep, but that sounds fun.
  • One of the use cases that has been on my mind lately is Beep within games. In games, a lot of source audio only requires a single channel. For example, enemy attack/grunt/movement sounds can be stored using a single channel. It is only until the sound is placed in the world that it gains a position. Then using the position information and the Doppler effect the audio is converted to 2 or more different channels for the speakers to play and your brain to interpret.

Furthermore: operations on channels.

Operations on channels

Because the channel count is stored in the Samples struct, Streamer operations that act on those channels become a possibility. This gives the user better control of what they want to do:

streamer, format, err := vorbis.Decode(myFileReader)
if err != nil {
	panic(err)
}

channels := SplitChannels(streamer)
desiredChannels := MergeChannels(channels[0], channels[2]) // keep only the front left and front right channel

err = speaker.Init(format.SampleRate, format.SampleRate.N(time.Second))
if err != nil {
	panic(err)
}
speaker.Play(disiredChannels)

I suspect the implementation of SplitChannels() and MergeChannels() will be a bit more complex than it may look at first. But I think it is doable.

Cons

  • Like I said, the Streamer becomes slightly more complex in some way.
  • Implementations of Streamer must support different values for NumChannels or return an error if the channel count is unsupported.
  • The speaker/Oto doesn't support more than 2 channels currently. It will be required to manually transform whatever Streamer you have to the required number of channels. However, the tools to do so will be available to you (see previous code snippet).
  • These changes are not backwards compatible.
@MarkKremer MarkKremer added enhancement New feature or request future Planned for a future version of Beep labels Oct 9, 2023
@MarkKremer MarkKremer self-assigned this Oct 9, 2023
@MarkKremer MarkKremer added proposal Requesting opinions and removed future Planned for a future version of Beep labels Oct 9, 2023
@dusk125
Copy link
Contributor

dusk125 commented Oct 9, 2023

Speaking to the speaker/Oto 2 channel problem: in addition to split and merge, there could be a, let's call it, MapChannels where you could specify how the n channels gets merged into 1 or 2 channels.

Something like

MapChannels(leftRightMapper{Left: []channel{channels[0], channels[2]}, Right: []channel{channels[1], channels[3]}})

Something like this could be a stereo to mono mapper

// Maps mono audio to stereo output
MapChannels(leftRightMapper{Left: []channel{channels[0]}, Right: []channel{channels[0]}})

For the proposal as a whole, I think it makes sense to have the channel information near the samples (and thus allow samples to have n channels). I've had a project where having the methods would've alone made it much easier to think about (I was mapping audio sent across the network to beep).

I wonder then if it would be worth having those that only support 1 or two channels, to have a special case streamer such that it's not possible to feed a 6 channel streamer into an speaker (for example). I feel that that could breed confusion and annoy to find bugs.

This was referenced Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request proposal Requesting opinions
Projects
None yet
Development

No branches or pull requests

2 participants