Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎡 Implement deserialization for ogg files #72

Merged
merged 8 commits into from
Oct 16, 2022
Merged

Conversation

tomrijnbeek
Copy link
Member

✨ What's this?

This PR adds support for extracting sound buffer data from OGG files, including a snapshot test.

πŸ”— Relationships

Closes #8

πŸ” Why do we want this?

OGG is an open format and is commonly used for compressed audio.

πŸ— How is it done?

The deserialization is largely done by the NVorbis library, so logically this PR is pretty straightforward. After looking at NAudio, a different audio library not built on OpenAL, I kinda like the idea of putting the actual file parsing logic in a separate class. The idea is that this class is a stateful object, so it can be used to stream the files rather than loading the entire file in memory at once.

My intent is to have a future PR migrate the wave file loading to a similar class. Hopefully we can extract a common interface, and have some streaming loading logic layer on top of that interface, so that the streaming can be applied to any file format you may use, as long as you have a stream for that file type.

All in all, I am aware that this is very WIP, and it wouldn't surprise me if there will be several changes to the underlying logic, though having the current static methods to load as convenience method is not something that will go away quickly. I have left the OggStream class as internal for the time being while we work on this.

πŸ’₯ Breaking changes

None.

πŸ”¬ Why not another way?

The current approach opens up the use cases I want to support (particularly streaming in #17), and contributes to a better reusable framework (see #16). There are probably other solutions, but this one seems the best for the time being. We probably need to start using this to get a better sense of what works and what doesn't, so I'd like to commit to this approach from now, and take this a bit agile.

πŸ¦‹ Side effects

WAV file loading needs to be migrated. Will create an issue if this PR is approved to validate the approach.

πŸ’‘ Review hints

Stream is a super overloaded term, so open to suggestions for better naming. Reader is a possibility, though also very vague. Decoder doesn't really imply that it's holding the lock to a file open.

Copy link
Member

@paulcscharf paulcscharf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly fine. Left some comments to consider.


namespace Bearded.Audio;

sealed class OggStream : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream is a specific .net class that most(all?) streams inherit from and that has a pretty specific interface and semantics.

Provides a generic view of a sequence of bytes.

https://learn.microsoft.com/en-us/dotnet/api/system.io.stream?view=net-7.0

As such I think calling this a Reader makes more sense - that is much more in line semantically with the variety of different readers in .net and libraries like json.net.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, renamed.

reader.Dispose();
}

public static OggStream FromFile(Stream file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static OggStream FromFile(Stream file)
public static OggStream FromStream(Stream file)

Or

Suggested change
public static OggStream FromFile(Stream file)
public static OggStream FromFileStream(Stream file)

Or simply

Suggested change
public static OggStream FromFile(Stream file)
public static OggStream From(Stream file)

?

No strong opinion, perhaps we can look up what classes like TextReader do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They use constructor overloads, which I don't like much. Using FromStream now.


public int SampleRate => reader.SampleRate;

public bool Ended => reader.IsEndOfStream;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the name of this kind of property in .net classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamReader calls this EndOfStream, StringReader and BinaryReader don't have it, and Stream itself only exposes the position and length, so you have to do it yourself. The VorbisReader from NVorbis uses IsEndOfStream as you can see. So... really no consistency there.

this.reader = reader;
}

public IList<short[]> ReadAllRemainingBuffers(int maxBufferSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return types should be specific I believe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're completely right.

{
TryReadSingleBuffer(out var buffer, maxBufferSize);
return buffer!;
}).ToImmutableArray();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the size is known, use an immutable array builder for fewer allocations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout, done.

throw new ArgumentException("Max buffer size must be positive.", nameof(maxBufferSize));
}

if (reader.IsEndOfStream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use Ended like we do in the method above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


public bool TryReadSingleBuffer([NotNullWhen(true)] out short[]? buffer, int maxBufferSize)
{
if (maxBufferSize <= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the above method should have a check like this too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

return false;
}

var floatBuffer = new float[largestBufferSizeDivisibleByChannelCount(maxBufferSize)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do anything to prevent reallocation of this temporary array? Or can we perhaps allocate it on the stack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now been introduced to the wonders of stackalloc. That's pretty cool actually!


var numSamplesRead = reader.ReadSamples(new Span<float>(floatBuffer));

buffer = new short[numSamplesRead];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not asking to change it now, but it would be great if we can later read into a set of provided buffers instead of allocations new ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Since sound buffer data is immutable though, right now you'll always have to allocate an array somewhere, so might as well keep it here for now.


return Enumerable.Range(0, totalBuffersNeeded).Select(_ =>
{
TryReadSingleBuffer(out var buffer, maxBufferSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method repeats some of the checks and calculations - perhaps we can extract a common private method for the actual reading from the reader?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored a bit.

src/Bearded.Audio/Core/OggReader.cs Outdated Show resolved Hide resolved
src/Bearded.Audio/Core/OggReader.cs Outdated Show resolved Hide resolved
var builder = ImmutableArray.CreateBuilder<short[]>((int) totalBuffersNeeded);
for (var i = 0; i < builder.Capacity; i++)
{
builder.Add(readSamples(bufferSize));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to treat a potentially partial last buffer differently?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readSamples already handles the partial buffer correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - I did not realise that Reader.ReadSamples did this automatically. I was looking at the array/spam allocation and couldn't find it there. It's fine then (and so was the name of the readSamples parameter) though it is perhaps slightly unfortunate that we may allocate a big float array even if we are only gonna use a part of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly that's the way it is, and we try to limit this problem by not making the buffer size too big. In practice - especially for ogg files - the files will hopefully be so long that most of the time you're filling at least a few buffers fully.

return true;
}

private short[] readSamples(int maxSampleCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is called 'max', but is it now simply the sampleCount at this point? (or even just 'count' given the context of the method)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be reading fewer samples than the actual sample count, if we are filling the last partial buffer. In other words, numSamplesRead (renamed to readSampleCount now for consistency in naming) can be less than or equal to maxSampleCount, I have now called it sampleCount, but it should be noted that the number of samples returned may be fewer than the specified parameter.

@tomrijnbeek tomrijnbeek merged commit 1aa491e into main Oct 16, 2022
@tomrijnbeek tomrijnbeek deleted the read-ogg branch October 16, 2022 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ogg file loading
2 participants