CQT, iCQT, and VQT implementations and testing #3804

d-dawg78 · 2024-06-27T16:24:05Z

Hey everyone,

I am happy to propose the addition of the CQT, iCQT, and VQT. The first two have been requested by issue 588. Since the CQT is a VQT with parameter gamma=0, I figured the VQT should be added to the package too. It also figures quite prominently in the research community, even as a time-frequency representation for neural networks. Here are a few important details.

General

The proposed transforms follow and test against the librosa implementations. Note that, since the algorithms are based on recursive sub-sampling, the results between the proposed transforms and librosa gradually diverge as the number of resampling iterations increases; the resampling algorithms differ. The librosa comparison test thresholds are adapted as such. The implementation being matched is the following:

librosa_vqt = vqt(
    y=<Y>,
    sr=<SAMPLE_RATE>,
    hop_length=<HOP_LENGTH>,
    fmin=<F_MIN>,
    n_bins=<N_BINS>,
    intervals="equal",
    gamma=<GAMMA>,
    bins_per_octave=<BINS_PER_OCTAVE>,
    tuning=0.,
    filter_scale=1,
    norm=1,
    sparsity=0.,
    window=<WINDOW>,
    scale=False,
    pad_mode="constant",
    res_type=<RES_TYPE>,
    dtype=<DTYPE>,
)

The <ARGUMENTS> (similar throughout all three transforms) are the controllable ones in the proposed code . The others are "hard-coded". In my opinion, they should stay that way to avoid unnecessary complexity. Future iterations of the transform could incorporate some of these arguments however, if requested by the community!

Tests

I was unable to make the transforms torch-scriptable. Maybe this should be the focus of a future PR. For the rest, I was able to test on CPU but not GPU for installation reasons. Feel free to let me know if any are lacking.

Speed

On the audio snippet from here, over 100 iterations, with dtype=torch.float64:

VQT - torchaudio: 15.208; librosa 50.121 (seconds)
CQT - torchaudio: 15.188; librosa 47.686 (seconds)
iCQT - torchaudio: 7.029; librosa 200.069 (seconds)

Sanity Check

Here's an image of the CQT-gram generated using the following parameters:

SAMPLE_RATE = 44100
HOP_LENGTH = 512
F_MIN = 32.703
N_BINS = 108
BINS_PER_OCTAVE = 12

The results are pretty much identical! Feel free to request changes or ask me any questions on this PR. I'll be happy to answer, and am excited to get these transforms to the package 🫡

pytorch-bot · 2024-06-27T16:24:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/3804

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zaptrem · 2024-07-01T04:02:37Z

Awesome contribution! A bunch of torch.ones tensors are initiated on CPU regardless of the input tensor's device. Also, it would be nice if there was an inverse VQT function as well. Also also, do you know a set of parameters that would result in a perfect or nearly perfect reconstruction? I had to fiddle with the filter lengths code to get something that was even close, but there's still an upper frequency buzzing sound and increased loudness at the start/end. I also noticed my 262,144 input to a CQT with hop_size 512 had an output size of 513 instead of 512 unless I set the hop_size to 513, but that may be a result of the aforementioned fiddling.

d-dawg78 · 2024-07-01T11:12:44Z

Hey, here's to addressing the feedback ☝️

Good catch on the torch.ones front - the most recent commit should address this issue.
We are following the librosa VQT, CQT, and iCQT algorithms, and they opted not to implement the inverse VQT for good reason. I think we should do the same, at least for now.
Here are parameters that led to decent waveform reconstruction on my end:

SAMPLE_RATE = 16000
HOP_LENGTH = 256
F_MIN = 32.703
N_BINS = 672
BINS_PER_OCTAVE = 96

Increasing the N_BINS and BINS_PER_OCTAVE accordingly increases CQT resolution, and by extension the reconstruction is much better 🙂

I don't really have a good answer for this. Probably the result of the set of parameters you're using..?

zaptrem · 2024-07-07T02:30:30Z

Here's an example of the high frequency artifacts/aliasing(?) in the reconstruction I can't get rid of (using your implementation without my adjustments):

sample_rate = 44100
hop_length = 512
f_min = 20
n_bins = 1280
bins_per_octave = 128

Original:
https://github.com/pytorch/audio/assets/1612230/3888b6b4-0695-4475-a89f-8db0bd22c552

Recon:
https://github.com/pytorch/audio/assets/1612230/e4d8934e-abcf-4419-a1ba-a09e0c562fc1

Even if I apply a low-pass filter to chop out freqs above 8000 before passing it into the above, I still get distortion when the bass beats:

recon.mp4

d-dawg78 · 2024-07-12T16:46:20Z

Hey,

Thank you for your patience - busy week! I managed to get decent reconstruction, without too many audible artefacts, using the following parameters, and without any transformations to the original signal:

SAMPLE_RATE = 44100
HOP_LENGTH = 256
F_MIN = 32.703
N_BINS = 1728
BINS_PER_OCTAVE = 192

There are two issues with the parameters you are using:

Your f_min should be mapped to a note frequency, so that your CQT bins are properly aligned with the equal temperament tuning system.
Your bins_per_octave parameter should be a multiple of 12, so that your frequency bins align with tones, semitones etc..

Of course, feel free to play around with lower resolutions!

icqt.mp4

zaptrem · 2024-08-09T19:56:36Z

Hey,

Thank you for your patience - busy week! I managed to get decent reconstruction, without too many audible artefacts, using the following parameters, and without any transformations to the original signal:
SAMPLE_RATE = 44100
HOP_LENGTH = 256
F_MIN = 32.703
N_BINS = 1728
BINS_PER_OCTAVE = 192
There are two issues with the parameters you are using:

Your f_min should be mapped to a note frequency, so that your CQT bins are properly aligned with the equal temperament tuning system.

Your bins_per_octave parameter should be a multiple of 12, so that your frequency bins align with tones, semitones etc..

Of course, feel free to play around with lower resolutions!

icqt.mp4

Thanks! However, this has 1728 * 256 size, whereas a normal spectrogram can store enough info for a perfect strong COLA reconstruction in only 512 * 256 numbers (or 1024 * 512, etc). Shouldn't a CQT be able to do the same without significant artifacting?

d-dawg78 and others added 30 commits January 28, 2024 23:42

VQT outline with base args.

f4dc893

Equal temparament frequencies set.

bd75d4d

Merge branch 'pytorch:main' into main

73f54b7

Raise error if max frequency is superior to Nyquist.

40d6581

VQT wavelet filter creation.

3f2689e

Top bin filter cutoff frequencies.

f13b46d

Merge branch 'pytorch:main' into main

d42cd12

Warnings for hop length and sample rate values.

bfcca8a

Forward loop outline.

213e2d8

Wavelet basis function implemented.

517e6c0

First shot at entire VQT done.

1ed2271

Sparsified rows.

eeba783

Removed sparsity and matched stft to librosa vqt implementation.

76afee0

Fixing dot product operation.

86d462c

Fixed resampling.

7aa9b43

Object-oriented optimizations!

b6f8b3c

CQT implementation.

09ffe6c

Splitting functions from classes to be used by iCQT.

1529f0a

iCQT outline and VQT batch computation.

b76ad57

iCQT algorithm start and outline.

78a5169

Pre-computations done :)

e06ac1d

Make frequencies float32 to avoid icqt einsum issues.

878eb26

Basis projection.

53334af

iCQT for 2D tensors :)

da65ec3

Comments on the iCQT and a few other spots.

d146d1a

Proper iCQT import.

170e9fa

Code cleanup in functional.

fa3298f

Librosa compatibility functional tests.

9a824a3

Small comment removed.

2ca6bd9

Fixing functional tests with new dtype method.

9edda98

d-dawg78 and others added 13 commits June 24, 2024 02:19

Proper transform tests.

0eee7ad

Updated src code for float and double transforms.

8a5cbe8

Batch consistency tests for transforms.

a9ec66b

Librosa VQT compatibility test.

bcff964

Typing change.

4209aaa

CQT librosa tests.

e764efe

Inverse CQT librosa compatibility tests.

0680828

Merge branch 'pytorch:main' into main

0d58ebd

Make sure CQT is VQT with gamma set to 0 test.

c17e90d

Typo.

ede80a2

Bug fixes and top notch librosa matching.

8e3c9ef

Higher frequency librosa q-transform tests.

3217b19

Updated VQT and CQT params in tests.

f2632a0

facebook-github-bot added the CLA Signed label Jun 27, 2024

d-dawg78 added 2 commits June 27, 2024 16:39

Removing useless white space change.

da23687

Small changes.

6b510f9

d-dawg78 marked this pull request as ready for review July 1, 2024 02:32

d-dawg78 requested a review from a team as a code owner July 1, 2024 02:32

Creating ones on correct device.

6b3f20f

Ones dtype.

71778c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CQT, iCQT, and VQT implementations and testing #3804

CQT, iCQT, and VQT implementations and testing #3804

d-dawg78 commented Jun 27, 2024 •

edited

Loading

pytorch-bot bot commented Jun 27, 2024

zaptrem commented Jul 1, 2024 •

edited

Loading

d-dawg78 commented Jul 1, 2024 •

edited

Loading

zaptrem commented Jul 7, 2024 •

edited

Loading

d-dawg78 commented Jul 12, 2024 •

edited

Loading

zaptrem commented Aug 9, 2024 •

edited

Loading

CQT, iCQT, and VQT implementations and testing #3804

Are you sure you want to change the base?

CQT, iCQT, and VQT implementations and testing #3804

Conversation

d-dawg78 commented Jun 27, 2024 • edited Loading

General

Tests

Speed

Sanity Check

pytorch-bot bot commented Jun 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/3804

zaptrem commented Jul 1, 2024 • edited Loading

d-dawg78 commented Jul 1, 2024 • edited Loading

zaptrem commented Jul 7, 2024 • edited Loading

d-dawg78 commented Jul 12, 2024 • edited Loading

zaptrem commented Aug 9, 2024 • edited Loading

d-dawg78 commented Jun 27, 2024 •

edited

Loading

zaptrem commented Jul 1, 2024 •

edited

Loading

d-dawg78 commented Jul 1, 2024 •

edited

Loading

zaptrem commented Jul 7, 2024 •

edited

Loading

d-dawg78 commented Jul 12, 2024 •

edited

Loading

zaptrem commented Aug 9, 2024 •

edited

Loading