You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AudioMetaData(sample_rate=256, num_frames=256, num_channels=2, bits_per_sample=24, encoding=PCM_S)
Traceback (most recent call last):
File "foo.py", line 14, in <module>
torchaudio.load(buffer_)
File "torchaudio/backend/sox_io_backend.py", line 147, in load
return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Only 16 and 32 bits are supported for signed PCM.
We can support add support for 24-bit signed integer linear PCM. It is straightforward for the case normalize=True as the returned Tensor will be float type with value range [-1.0, 1.0] but there is an open question of what to do when normalize=False. When normalize=False, and the audio format is WAVE, our load function picks the corresponding Tensor type, such as torch.int16 for 16-bit signed integer linear PCM, but Pytorch does not have 24-bit data type so we need to decide what to do when normalize=False.
Raise an error.
Use 32-bit signed integer (and maybe add warning)
Something else.
In the following, we assume that we will take the approach 2.
We need to add test for 24-bit to the test function, but our test fixtures for WAVE I/O relies on scipy, which cannot handle 24-bit audio so we one will need to work around. This is not straightforward. One workaround is to do something similar to how mp3 loading is tested. We can generate 24-bit audio file with sox command and load it with torchaudio.load and create a reference value by first converting the file to 32bit signed integer then loading it with scipy. For the detail of this approach, please refer the docstring of mp3 test above.
Currently the
save
andinfo
functions of "sox_io" backend support 24-bit signed integer liner PCM, butload
function does not support it.We can support add support for 24-bit signed integer linear PCM. It is straightforward for the case
normalize=True
as the returned Tensor will befloat
type with value range [-1.0, 1.0] but there is an open question of what to do whennormalize=False
. Whennormalize=False
, and the audio format is WAVE, ourload
function picks the corresponding Tensor type, such astorch.int16
for 16-bit signed integer linear PCM, but Pytorch does not have 24-bit data type so we need to decide what to do whennormalize=False
.In the following, we assume that we will take the approach 2.
Steps
get_dtype
function which is used byapply_effects_file
andapply_effects_fileobj
.Testing
24-bit
to the test function, but our test fixtures for WAVE I/O relies onscipy
, which cannot handle 24-bit audio so we one will need to work around. This is not straightforward. One workaround is to do something similar to howmp3
loading is tested. We can generate 24-bit audio file withsox
command and load it withtorchaudio.load
and create a reference value by first converting the file to 32bit signed integer then loading it withscipy
. For the detail of this approach, please refer the docstring ofmp3
test above.Setting up dev env
Please refer to CONTRIBUTING.md for setting up a dev env.
The text was updated successfully, but these errors were encountered: