Support 24bit wave in sox_io `load` function. #1376

mthrok · 2021-03-09T22:22:56Z

Currently the save and info functions of "sox_io" backend support 24-bit signed integer liner PCM, but load function does not support it.

import io
import torch
import torchaudio


tensor = torch.randn([2, 256])
buffer_ = io.BytesIO()
torchaudio.save(buffer_, tensor, 256, format='wav', encoding='PCM_S', bits_per_sample=24)

buffer_.seek(0)
print(torchaudio.info(buffer_))

buffer_.seek(0)
torchaudio.load(buffer_)

AudioMetaData(sample_rate=256, num_frames=256, num_channels=2, bits_per_sample=24, encoding=PCM_S)
Traceback (most recent call last):
  File "foo.py", line 14, in <module>
    torchaudio.load(buffer_)
  File "torchaudio/backend/sox_io_backend.py", line 147, in load
    return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Only 16 and 32 bits are supported for signed PCM.

We can support add support for 24-bit signed integer linear PCM. It is straightforward for the case normalize=True as the returned Tensor will be float type with value range [-1.0, 1.0] but there is an open question of what to do when normalize=False. When normalize=False, and the audio format is WAVE, our load function picks the corresponding Tensor type, such as torch.int16 for 16-bit signed integer linear PCM, but Pytorch does not have 24-bit data type so we need to decide what to do when normalize=False.

Raise an error.
Use 32-bit signed integer (and maybe add warning)
Something else.

In the following, we assume that we will take the approach 2.

Steps

Update the get_dtype function which is used by apply_effects_file and apply_effects_fileobj .

Testing

We need to add test for 24-bit to the test function, but our test fixtures for WAVE I/O relies on scipy, which cannot handle 24-bit audio so we one will need to work around. This is not straightforward. One workaround is to do something similar to how mp3 loading is tested. We can generate 24-bit audio file with sox command and load it with torchaudio.load and create a reference value by first converting the file to 32bit signed integer then loading it with scipy. For the detail of this approach, please refer the docstring of mp3 test above.

Setting up dev env

Please refer to CONTRIBUTING.md for setting up a dev env.

The text was updated successfully, but these errors were encountered:

iseessel · 2021-03-11T18:50:43Z

Hi I'll be working on this :)!

mthrok · 2021-03-12T05:14:59Z

@iseessel Thanks, let me know if you need help.

Added TorchScript label to "Introduction to TorchScript" and "Loading a TorchScript Model in C++" tutorials.

mthrok added help wanted C++ contributions welcome labels Mar 9, 2021

mthrok linked a pull request Mar 14, 2021 that will close this issue

Add support for 24-bit signed LPCM wav in sox_io backend #1389

Merged

mthrok closed this as completed in #1389 Mar 15, 2021

mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022

Added TorchScript label (pytorch#1376)

5877006

Added TorchScript label to "Introduction to TorchScript" and "Loading a TorchScript Model in C++" tutorials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 24bit wave in sox_io `load` function. #1376

Support 24bit wave in sox_io `load` function. #1376

mthrok commented Mar 9, 2021

iseessel commented Mar 11, 2021

mthrok commented Mar 12, 2021

Support 24bit wave in sox_io load function. #1376

Support 24bit wave in sox_io load function. #1376

Comments

mthrok commented Mar 9, 2021

Steps

Testing

Setting up dev env

iseessel commented Mar 11, 2021

mthrok commented Mar 12, 2021

Support 24bit wave in sox_io `load` function. #1376

Support 24bit wave in sox_io `load` function. #1376