Resample can't process waveforms whose dtype=int16 #2294

Mashiro009 · 2022-03-26T16:55:25Z

🐛 Describe the bug

I found that the default input for torchaudio Resample function is audio(waveforms) whose dtype=float32.
Therefore, Resample can't process waveforms whose dtype=int16.
I'm not quite sure whether this can be considered as a bug.

To Reproduce

import torchaudio
import torchaudio.backend.sox_io_backend as sox

wav = 'example.wav' # mono, sample_rate=48000
waveforms, sample_rate = sox.load(wav, normalize=False)
# waveforms dtype=int16,short
resample = 16000
if sample_rate != resample:
    audio = torchaudio.transforms.Resample(
        sample_rate, resample)(audio)

The following problem will be encountered, because kernel.dtype=float32, but waveform.dtype=int16
This is a type conflict when used as parameters for the torch.nn.functional.conv1d function

Traceback (most recent call last):
  File "tmp_sox.py", line 176, in <module>
    audio = torchaudio.transforms.Resample(
  File "/home/usr/anaconda3/envs/wenet_cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/usr/anaconda3/envs/wenet_cpu/lib/python3.8/site-packages/torchaudio/transforms.py", line 898, in forward
    return _apply_sinc_resample_kernel(
  File "/home/usr/anaconda3/envs/wenet_cpu/lib/python3.8/site-packages/torchaudio/functional/functional.py", line 1593, in _apply_sinc_resample_kernel
    resampled = torch.nn.functional.conv1d(waveform[:, None], kernel, stride=orig_freq)
RuntimeError: expected scalar type Short but found Float

Versions

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.10.0
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] blas 1.0 mkl
[conda] cpuonly 2.0 0 pytorch
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h497a2fe_0 conda-forge
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h1abd341_0 conda-forge
[conda] numpy 1.22.3 pypi_0 pypi
[conda] numpy-base 1.21.2 py38h79a1101_0
[conda] pytorch 1.10.0 py3.8_cpu_0 pytorch
[conda] pytorch-mutex 1.0 cpu pytorch
[conda] torchaudio 0.10.0 py38_cpu [cpuonly] pytorch
[conda] torchvision 0.11.1 py38_cpu [cpuonly] pytorch

The text was updated successfully, but these errors were encountered:

mthrok · 2022-03-26T17:13:58Z

Hi @Mashiro009

The resample transform caches the kernel at the construction, and by default the cached kernel is float32 type. To resample other types, the kernel has to be moved. i.e.

resampler = torchaudio.transforms.Resample(...)
resampler.to(dtype=<DTYPE>, device=<DEVICE>)

However, integer type is not supported by the underlying PyTorch features (nn.Module and torch.nn.functional.conv1d).

>>> resampler.to(dtype=torch.int16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/moto/miniforge3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 892, in to
    raise TypeError('nn.Module.to only accepts floating point or complex '
TypeError: nn.Module.to only accepts floating point or complex dtypes, but got desired dtype=torch.int16

similarly trying with torchaudio.functional.resample cause an error.

>>> torchaudio.functional.resample(torch.randint(256, (2, 250)), 8000, 16000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/moto/Development/torchaudio/torchaudio/functional/functional.py", line 1472, in resample
    resampled = _apply_sinc_resample_kernel(waveform, orig_freq, new_freq, gcd, kernel, width)
  File "/Users/moto/Development/torchaudio/torchaudio/functional/functional.py", line 1411, in _apply_sinc_resample_kernel
    resampled = torch.nn.functional.conv1d(waveform[:, None], kernel, stride=orig_freq)
RuntimeError: expected scalar type Long but found Float

I gave some thought but since the underlying convolution implementation requires the same dtype across the input and the kernel, to perform resample on int type, the kernel has to be on int type. But if the kernel is on int type, it cannot express the sinc interpolation kernel. So I do not think it is plausible to perform resampling with integer type. Keeping the normalize parameter of torchaudio.load and it should work. (note: normalize is a misnomer it does not normalize the volume, instead it simply changes the value type from native to float32 with value range in (-1, 1].)

Mashiro009 · 2022-03-26T17:25:37Z

Hi @mthrok

Thanks for your effort.

Is it possible to add codes later to limit or handle this situation, rather than letting conv1d report the error?
Of course this is just a suggestion.

mthrok · 2022-03-28T05:16:37Z

@carolineechen Could you follow-up on this? There is not is_int in PyTorch but I think we can use is_floating_point. i.e. if not t.is_floating_point(): raise an error....

Summary: Resolves pytorch#2294 Raise an error if the waveform to be resampled is not of floating point type. The `conv1d` operation used in resampling and `nn.Module` used for the transforms don't support integer type. Pull Request resolved: pytorch#2318 Reviewed By: mthrok Differential Revision: D35379276 Pulled By: carolineechen fbshipit-source-id: f8f9539a051e7c3d22bcb45ca6a34aaef67abed0

carolineechen mentioned this issue Apr 4, 2022

Raise error for resampling int waveform #2318

Closed

facebook-github-bot closed this as completed in 11328d2 Apr 5, 2022

Mashiro009 mentioned this issue Apr 23, 2022

[tools] normalize the audio before resample wenet-e2e/wenet#1056

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample can't process waveforms whose dtype=int16 #2294

Resample can't process waveforms whose dtype=int16 #2294

Mashiro009 commented Mar 26, 2022

mthrok commented Mar 26, 2022 •

edited

Loading

Mashiro009 commented Mar 26, 2022

mthrok commented Mar 28, 2022

Resample can't process waveforms whose dtype=int16 #2294

Resample can't process waveforms whose dtype=int16 #2294

Comments

Mashiro009 commented Mar 26, 2022

🐛 Describe the bug

To Reproduce

Versions

mthrok commented Mar 26, 2022 • edited Loading

Mashiro009 commented Mar 26, 2022

mthrok commented Mar 28, 2022

mthrok commented Mar 26, 2022 •

edited

Loading