Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 24bit wave in sox_io load function. #1376

Closed
mthrok opened this issue Mar 9, 2021 · 2 comments · Fixed by #1389
Closed

Support 24bit wave in sox_io load function. #1376

mthrok opened this issue Mar 9, 2021 · 2 comments · Fixed by #1389

Comments

@mthrok
Copy link
Collaborator

mthrok commented Mar 9, 2021

Currently the save and info functions of "sox_io" backend support 24-bit signed integer liner PCM, but load function does not support it.

import io
import torch
import torchaudio


tensor = torch.randn([2, 256])
buffer_ = io.BytesIO()
torchaudio.save(buffer_, tensor, 256, format='wav', encoding='PCM_S', bits_per_sample=24)

buffer_.seek(0)
print(torchaudio.info(buffer_))

buffer_.seek(0)
torchaudio.load(buffer_)
AudioMetaData(sample_rate=256, num_frames=256, num_channels=2, bits_per_sample=24, encoding=PCM_S)
Traceback (most recent call last):
  File "foo.py", line 14, in <module>
    torchaudio.load(buffer_)
  File "torchaudio/backend/sox_io_backend.py", line 147, in load
    return torchaudio._torchaudio.load_audio_fileobj(
RuntimeError: Only 16 and 32 bits are supported for signed PCM.

We can support add support for 24-bit signed integer linear PCM. It is straightforward for the case normalize=True as the returned Tensor will be float type with value range [-1.0, 1.0] but there is an open question of what to do when normalize=False. When normalize=False, and the audio format is WAVE, our load function picks the corresponding Tensor type, such as torch.int16 for 16-bit signed integer linear PCM, but Pytorch does not have 24-bit data type so we need to decide what to do when normalize=False.

  1. Raise an error.
  2. Use 32-bit signed integer (and maybe add warning)
  3. Something else.

In the following, we assume that we will take the approach 2.

Steps

  1. Update the get_dtype function which is used by apply_effects_file and apply_effects_fileobj .

Testing

  1. We need to add test for 24-bit to the test function, but our test fixtures for WAVE I/O relies on scipy, which cannot handle 24-bit audio so we one will need to work around. This is not straightforward. One workaround is to do something similar to how mp3 loading is tested. We can generate 24-bit audio file with sox command and load it with torchaudio.load and create a reference value by first converting the file to 32bit signed integer then loading it with scipy. For the detail of this approach, please refer the docstring of mp3 test above.

Setting up dev env

Please refer to CONTRIBUTING.md for setting up a dev env.

@iseessel
Copy link
Contributor

Hi I'll be working on this :)!

@mthrok
Copy link
Collaborator Author

mthrok commented Mar 12, 2021

@iseessel Thanks, let me know if you need help.

@mthrok mthrok linked a pull request Mar 14, 2021 that will close this issue
mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022
Added TorchScript label  to "Introduction to TorchScript" and "Loading a TorchScript Model in C++" tutorials.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants