Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio file creation tests. #72

Merged
merged 8 commits into from
Jan 5, 2024
Merged

Audio file creation tests. #72

merged 8 commits into from
Jan 5, 2024

Conversation

CharlesHolbrow
Copy link
Contributor

@CharlesHolbrow CharlesHolbrow commented Jan 4, 2024

We want to add support for writing data-compressed audio formats. However, there are some obstacles associated with reliably writing compressed audio from python

Avoid MP3s

I would like to avoid mp3s because mp3 compression introduces enough delay to make files out-of-sync. It's not suitable for multitrack audio or audio that needs to loop seamlessly. We are working with multitracks (producer model) so this is a non-starter.

Viable alternatives:

  • .ogg (vorbis or opus)
  • .opus (opus)

We need a python library that can write either format reliably in-memory. Tragically, librosa and torchaudio are not good at this. Using the disk is too slow for the scale that we need.

Option 1. Python soundfile

The python soundfile package can write .ogg files with the vorbis codec, but

  • You have no control over the quality/bitrate
  • You have to watch out for this bug (which is carefully handled in our current numpy_to_ogg implementation)
  • I believe there is opus support int he pipeline (but I need to double check this)

Option 2. Python pydub

  • Depends on ffmpeg
  • You have to use a version of ffmpeg that is compiled with support for the needed codecs
  • The ffmpeg version installed with conda in our klay-beam docker containers are not compiled with the needed codecs, necessitating apt install or compiling ffmpeg from scratch in the docker image.

Testing

This PR adds support for testing that files encoded with lossy formats worked correctly. Once this is finalized we are setup to reliably pursue one of the options above.

The current implementation uses a few different mechanisms for testing that audio file encoding. These mechanisms are ready for review.

Copy link
Contributor

@cyrusvahidi cyrusvahidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@mxkrn mxkrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the recommendation about using pytest fixtures, I'm wondering where you got the various hard-coded tolerance / divergence values from that you are using, especially in test_wav_file, test_mp3_file, and test_ogg_file

Copy link
Collaborator

@mxkrn mxkrn Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't required but would clean this up somewhat. That is, if we're going to be touching the audio file tests, I would recommend taking this opportunity to also remove the contents of this file out of an offline script and use the fixture creation functionality provided by pytest so that we don't have to commit the audio files.

This can be done using by following these steps:

  1. Move all helper methods to tests/conftest.py.
  2. Still in tests/conftest.py or in the test module itself, the actual fixture methods can use the pytest provided tmp_path fixture which provides a temporary filepath that's persisted within the scope of the tests. For example:
@pytest.fixture
def mp3_filepath(tmp_path: Path) -> Path:
    stereo_audio, sr = create_test_audio_stereo()
    mp3_buffer = numpy_to_mp3(stereo_audio, sr)
    
    save_path = tmp_path / "test_stereo.mp3"
    with open(save_path, "wb") as out_file:
        out_file.write(mp3_buffer.getvalue())
    mp3_buffer.seek(0)
    return save_path
    
  1. Use mp3_filepath as input to the test_mp3_file to test the file loading functionality

Copy link
Contributor Author

@CharlesHolbrow CharlesHolbrow Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the older tests for LoadWithLibrosa do use the files that were created manually viacreate_test_audio_files.py. These tests could be replaced now with the fixture method, which would allow us to remove some of the wavs from this repo. That would be nice, but I'm going to treat it as a separate issue.

The tests in this PR operate on two kinds of audio

  • in-memory test signals
  • audio files with music content (on disk, in repo) which are used as a known baseline. These shouldn't be automatically generated, because we need a baseline to test against (otherwise our test just verifies that the audio file encoders output the same thing when called in succession, as opposed to verifying that they output the correct thing)

@CharlesHolbrow
Copy link
Contributor Author

CharlesHolbrow commented Jan 4, 2024

I'm wondering where you got the various hard-coded tolerance / divergence values from that you are using, especially in test_wav_file, test_mp3_file, and test_ogg_file

All the hard coded values are based on my empirical observations about values that reflected known-working condition.

For wav files, we could manually calculate the expected delta between floating point audio and discrete 16/24 bit PCM audio. That seems like overkill to me, especially when it's the compressed file format writers that come with the most risk. Can you think of a better way to do this? If not, I think the current implementation is a reasonable balance of safety/effort...but I welcome suggestions.

The main danger with the numpy_to_mp3 and numpy_to_wav has to do with environment dependencies such as ffmpeg and libsndfile. If we (for example) build a docker container that has a older version of libsndfile, then numpy_to_ogg will fail silently and just write digital black to our output audio file. I want to be able run our tests in ALL our docker images before publishing them. That gives us some protection against nasty surprises when writing millions of audio files.

With that in mind, the next steps I'm imagining after confirming these tests are satisfactory:

  • Choose a method for writing data-compressed audio (see top of this PR)
  • Run all these tests during the docker build process before publishing to DockerHub

@CharlesHolbrow CharlesHolbrow merged commit 6a637b6 into main Jan 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants