Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move resample to functional and add librosa comparison #1402

Merged
merged 4 commits into from
Mar 22, 2021

Conversation

carolineechen
Copy link
Contributor

@carolineechen carolineechen commented Mar 19, 2021

  • add resample to torchaudio.functional, copied over from kaldi compliance resample_waveform
  • adds batching to kaldi compliance interface resample_waveform, which now wraps functional.resample
  • add test to compare torchaudio resample against librosa.resample

lr_upsampled = librosa.resample(waveform.squeeze(0).numpy(), sample_rate, upsample_rate)
lr_upsampled = torch.from_numpy(lr_upsampled).unsqueeze(0)

self.assertEqual(ta_upsampled, lr_upsampled, atol=1e-4, rtol=1e-5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vincentqb @cpuhrsch what's an acceptable margin of error for this comparison? using these values gives an assertion error Tensors failed to compare as equal!With rtol=1e-05 and atol=0.0001, found 28142 element(s) (out of 128000) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 0.006743520498275757 (0.23187145590782166 vs. 0.2386149764060974), which occurred at index (0, 127992).

Copy link
Contributor

@vincentqb vincentqb Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at the kaldi compliance as guidance. As you can see, the tolerances are fairly high there too.

Since we are not changing the algorithm as part of this PR, these new tests are informational, and we take the threshold that makes them pass (after we make sure that we match the correct parameters of course). The threshold therefore becomes new signal/information that we learned from this exercise.

Based on the numbers you gave, assuming we use the correct corresponding parameters for each resampling functions, I'd say atol=1e-2 which is also used here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw @carolineechen, as discussed offline with @cpuhrsch, what other parameters have you tried in librosa's resample? let's follow-up by opening a pull request that calls them explicitly :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had only tried the librosa default values for this test, but yup, I can look into the other parameters

@carolineechen carolineechen marked this pull request as ready for review March 22, 2021 14:09
Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! I'm glad to see this happening :)

torchaudio/compliance/kaldi.py Show resolved Hide resolved
test/torchaudio_unittest/compliance_kaldi_test.py Outdated Show resolved Hide resolved
docs/source/compliance.kaldi.rst Outdated Show resolved Hide resolved
torchaudio/compliance/kaldi.py Outdated Show resolved Hide resolved
torchaudio/compliance/kaldi.py Outdated Show resolved Hide resolved
self.assertEqual(ta_upsampled, lr_upsampled, atol=1e-2, rtol=1e-5)

ta_downsampled = F.resample(waveform, sample_rate, downsample_rate)
lr_downsampled = librosa.resample(waveform.squeeze(0).numpy(), sample_rate, downsample_rate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question for future work: why is squeeze needed before passing to librosa?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you looked into this @carolineechen ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waveform has dimensions (1, n), but librosa.resample requires it to be of shape (n,) or (2, n)

@@ -649,17 +648,7 @@ def forward(self, waveform: Tensor) -> Tensor:
Tensor: Output signal of dimension (..., time).
"""
if self.resampling_method == 'sinc_interpolation':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: we'll leave the discussion around migrating the resampling_method parameter to a follow-up PR

Copy link
Contributor

@vincentqb vincentqb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, LGTM overall :)

torchaudio/compliance/kaldi.py Outdated Show resolved Hide resolved
@carolineechen carolineechen merged commit 14dd917 into pytorch:master Mar 22, 2021
@carolineechen carolineechen deleted the resample branch March 22, 2021 20:08
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
The Building a Convolution/Batch Norm fuser in FX tutorial (https://pytorch.org/tutorials/intermediate/fx_conv_bn_fuser.html) was added for 1.8. This is a minor update to index.rst to have the tutorial show up in the left hand nav under the "Code Transforms with FX" category.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants