Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample can't process waveforms whose dtype=int16 #2294

Closed
Mashiro009 opened this issue Mar 26, 2022 · 3 comments
Closed

Resample can't process waveforms whose dtype=int16 #2294

Mashiro009 opened this issue Mar 26, 2022 · 3 comments

Comments

@Mashiro009
Copy link

🐛 Describe the bug

I found that the default input for torchaudio Resample function is audio(waveforms) whose dtype=float32.
Therefore, Resample can't process waveforms whose dtype=int16.
I'm not quite sure whether this can be considered as a bug.

To Reproduce

import torchaudio
import torchaudio.backend.sox_io_backend as sox

wav = 'example.wav' # mono, sample_rate=48000
waveforms, sample_rate = sox.load(wav, normalize=False)
# waveforms dtype=int16,short
resample = 16000
if sample_rate != resample:
    audio = torchaudio.transforms.Resample(
        sample_rate, resample)(audio)

The following problem will be encountered, because kernel.dtype=float32, but waveform.dtype=int16
This is a type conflict when used as parameters for the torch.nn.functional.conv1d function

Traceback (most recent call last):
  File "tmp_sox.py", line 176, in <module>
    audio = torchaudio.transforms.Resample(
  File "/home/usr/anaconda3/envs/wenet_cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/usr/anaconda3/envs/wenet_cpu/lib/python3.8/site-packages/torchaudio/transforms.py", line 898, in forward
    return _apply_sinc_resample_kernel(
  File "/home/usr/anaconda3/envs/wenet_cpu/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 1593, in _apply_sinc_resample_kernel
    resampled = torch.nn.functional.conv1d(waveform[:, None], kernel, stride=orig_freq)
RuntimeError: expected scalar type Short but found Float

Versions

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.10.0
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] blas 1.0 mkl
[conda] cpuonly 2.0 0 pytorch
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h497a2fe_0 conda-forge
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] numpy 1.22.3 pypi_0 pypi
[conda] numpy-base 1.21.2 py38h79a1101_0
[conda] pytorch 1.10.0 py3.8_cpu_0 pytorch
[conda] pytorch-mutex 1.0 cpu pytorch
[conda] torchaudio 0.10.0 py38_cpu [cpuonly] pytorch
[conda] torchvision 0.11.1 py38_cpu [cpuonly] pytorch

@mthrok
Copy link
Collaborator

mthrok commented Mar 26, 2022

Hi @Mashiro009

The resample transform caches the kernel at the construction, and by default the cached kernel is float32 type. To resample other types, the kernel has to be moved. i.e.

resampler = torchaudio.transforms.Resample(...)
resampler.to(dtype=<DTYPE>, device=<DEVICE>)

However, integer type is not supported by the underlying PyTorch features (nn.Module and torch.nn.functional.conv1d).

>>> resampler.to(dtype=torch.int16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/moto/miniforge3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 892, in to
    raise TypeError('nn.Module.to only accepts floating point or complex '
TypeError: nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.int16

similarly trying with torchaudio.functional.resample cause an error.

>>> torchaudio.functional.resample(torch.randint(256, (2, 250)), 8000, 16000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/moto/Development/torchaudio/torchaudio/functional/functional.py", line 1472, in resample
    resampled = _apply_sinc_resample_kernel(waveform, orig_freq, new_freq, gcd, kernel, width)
  File "/Users/moto/Development/torchaudio/torchaudio/functional/functional.py", line 1411, in _apply_sinc_resample_kernel
    resampled = torch.nn.functional.conv1d(waveform[:, None], kernel, stride=orig_freq)
RuntimeError: expected scalar type Long but found Float

I gave some thought but since the underlying convolution implementation requires the same dtype across the input and the kernel, to perform resample on int type, the kernel has to be on int type. But if the kernel is on int type, it cannot express the sinc interpolation kernel. So I do not think it is plausible to perform resampling with integer type. Keeping the normalize parameter of torchaudio.load and it should work. (note: normalize is a misnomer it does not normalize the volume, instead it simply changes the value type from native to float32 with value range in (-1, 1].)

@Mashiro009
Copy link
Author

Hi @mthrok

Thanks for your effort.

Is it possible to add codes later to limit or handle this situation, rather than letting conv1d report the error?
Of course this is just a suggestion.

@mthrok
Copy link
Collaborator

mthrok commented Mar 28, 2022

@carolineechen Could you follow-up on this? There is not is_int in PyTorch but I think we can use is_floating_point. i.e. if not t.is_floating_point(): raise an error....

xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this issue May 4, 2022
Summary:
Resolves pytorch#2294

Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type.

Pull Request resolved: pytorch#2318

Reviewed By: mthrok

Differential Revision: D35379276

Pulled By: carolineechen

fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants