Audio file creation tests. #72

CharlesHolbrow · 2024-01-04T01:28:15Z

We want to add support for writing data-compressed audio formats. However, there are some obstacles associated with reliably writing compressed audio from python

Avoid MP3s

I would like to avoid mp3s because mp3 compression introduces enough delay to make files out-of-sync. It's not suitable for multitrack audio or audio that needs to loop seamlessly. We are working with multitracks (producer model) so this is a non-starter.

Viable alternatives:

.ogg (vorbis or opus)
.opus (opus)

We need a python library that can write either format reliably in-memory. Tragically, librosa and torchaudio are not good at this. Using the disk is too slow for the scale that we need.

Option 1. Python `soundfile`

The python soundfile package can write .ogg files with the vorbis codec, but

You have no control over the quality/bitrate
You have to watch out for this bug (which is carefully handled in our current numpy_to_ogg implementation)
I believe there is opus support int he pipeline (but I need to double check this)

Option 2. Python `pydub`

Depends on ffmpeg
You have to use a version of ffmpeg that is compiled with support for the needed codecs
The ffmpeg version installed with conda in our klay-beam docker containers are not compiled with the needed codecs, necessitating apt install or compiling ffmpeg from scratch in the docker image.

Testing

This PR adds support for testing that files encoded with lossy formats worked correctly. Once this is finalized we are setup to reliably pursue one of the options above.

The current implementation uses a few different mechanisms for testing that audio file encoding. These mechanisms are ready for review.

cyrusvahidi

LGTM

mxkrn

Besides the recommendation about using pytest fixtures, I'm wondering where you got the various hard-coded tolerance / divergence values from that you are using, especially in test_wav_file, test_mp3_file, and test_ogg_file

mxkrn · 2024-01-04T10:39:27Z

klay_beam/bin/create_test_audio_files.py

This isn't required but would clean this up somewhat. That is, if we're going to be touching the audio file tests, I would recommend taking this opportunity to also remove the contents of this file out of an offline script and use the fixture creation functionality provided by pytest so that we don't have to commit the audio files.

This can be done using by following these steps:

Move all helper methods to tests/conftest.py.

Still in tests/conftest.py or in the test module itself, the actual fixture methods can use the pytest provided tmp_path fixture which provides a temporary filepath that's persisted within the scope of the tests. For example:

@pytest.fixture def mp3_filepath(tmp_path: Path) -> Path: stereo_audio, sr = create_test_audio_stereo() mp3_buffer = numpy_to_mp3(stereo_audio, sr) save_path = tmp_path / "test_stereo.mp3" with open(save_path, "wb") as out_file: out_file.write(mp3_buffer.getvalue()) mp3_buffer.seek(0) return save_path

Use mp3_filepath as input to the test_mp3_file to test the file loading functionality

Some of the older tests for LoadWithLibrosa do use the files that were created manually viacreate_test_audio_files.py. These tests could be replaced now with the fixture method, which would allow us to remove some of the wavs from this repo. That would be nice, but I'm going to treat it as a separate issue.

The tests in this PR operate on two kinds of audio

in-memory test signals

audio files with music content (on disk, in repo) which are used as a known baseline. These shouldn't be automatically generated, because we need a baseline to test against (otherwise our test just verifies that the audio file encoders output the same thing when called in succession, as opposed to verifying that they output the correct thing)

CharlesHolbrow · 2024-01-04T22:08:05Z

I'm wondering where you got the various hard-coded tolerance / divergence values from that you are using, especially in test_wav_file, test_mp3_file, and test_ogg_file

All the hard coded values are based on my empirical observations about values that reflected known-working condition.

For wav files, we could manually calculate the expected delta between floating point audio and discrete 16/24 bit PCM audio. That seems like overkill to me, especially when it's the compressed file format writers that come with the most risk. Can you think of a better way to do this? If not, I think the current implementation is a reasonable balance of safety/effort...but I welcome suggestions.

The main danger with the numpy_to_mp3 and numpy_to_wav has to do with environment dependencies such as ffmpeg and libsndfile. If we (for example) build a docker container that has a older version of libsndfile, then numpy_to_ogg will fail silently and just write digital black to our output audio file. I want to be able run our tests in ALL our docker images before publishing them. That gives us some protection against nasty surprises when writing millions of audio files.

With that in mind, the next steps I'm imagining after confirming these tests are satisfactory:

Choose a method for writing data-compressed audio (see top of this PR)
Run all these tests during the docker build process before publishing to DockerHub

CharlesHolbrow added 3 commits December 20, 2023 12:10

tests: text wav file generation for #1

4341b7c

tests: test mp3 file generation

3786f2f

tests: test numpy_to_ogg for #1

a32117b

CharlesHolbrow requested review from cyrusvahidi and mxkrn January 4, 2024 01:28

code-style

1913115

cyrusvahidi approved these changes Jan 4, 2024

View reviewed changes

mxkrn reviewed Jan 4, 2024

View reviewed changes

CharlesHolbrow added 4 commits January 4, 2024 15:18

ensure py.typed is present in pip packages.

3780c5a

small fixes, including type-check

739f75f

Ask mypy to all more speciffic type in DoFn subclasses

f5c2f14

mypy: allow more speciffic types in DoFn subclass process method

afde1c9

CharlesHolbrow merged commit 6a637b6 into main Jan 5, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio file creation tests. #72

Audio file creation tests. #72

CharlesHolbrow commented Jan 4, 2024 •

edited

Loading

cyrusvahidi left a comment

mxkrn left a comment

mxkrn Jan 4, 2024 •

edited

Loading

CharlesHolbrow Jan 4, 2024 •

edited

Loading

CharlesHolbrow commented Jan 4, 2024 •

edited

Loading

Audio file creation tests. #72

Audio file creation tests. #72

Conversation

CharlesHolbrow commented Jan 4, 2024 • edited Loading

Avoid MP3s

Option 1. Python soundfile

Option 2. Python pydub

Testing

cyrusvahidi left a comment

Choose a reason for hiding this comment

mxkrn left a comment

Choose a reason for hiding this comment

mxkrn Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

CharlesHolbrow Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

CharlesHolbrow commented Jan 4, 2024 • edited Loading

CharlesHolbrow commented Jan 4, 2024 •

edited

Loading

Option 1. Python `soundfile`

Option 2. Python `pydub`

mxkrn Jan 4, 2024 •

edited

Loading

CharlesHolbrow Jan 4, 2024 •

edited

Loading

CharlesHolbrow commented Jan 4, 2024 •

edited

Loading