-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about hop_lenth #1
Comments
Hi, the hop_length variable in config.py is set to 32, in past commits it has been more though |
I mean, considering a frame length of 32 ms and a frame shift of 16 ms, shouldn't hop_lenth be set to 256? |
I've made sure to keep the hop_length as a power of 2 less than fft_size so I am at liberty to experiement with various window types. I'd be reluctant to make hop_length 256 as that would come at the cost of time resolution. I padded (0,0,0,1) so that I'm padding DC with 0. I'd rather have done this than added 0 to the top frequency bin, which would have altered the signal. Great spot on the lack of window in the ISTFT, I'm implementing that now. |
Thank you for your reply, I understand what you mean. But I have experimented with your code in the padding process. Because you removed the DC component after STFT, you need to add the DC component before ISTFT. But the correct way should be (0,0,1,0) . Because I could not recover the voice correctly using the 0,0,0,1 you provided. |
I just checked this and you're right. I was assuming |
Hi, I output the original speech and the transformed speech to the speech spectrogram to check whether they can be perfectly reconstructed. |
I've been using https://github.com/wavefrontshaping/complexPyTorch/blob/master/complexPyTorch/complexFunctions.py as a reference, which looks similar to what you've sent. Good spot though, I must have not changed the function names when copying them over from |
The reason for using maxpooling is to implement the CBAM attention mechanism. There are too many attention mechanisms out there now, and your attempt will make a great contribution. I also have a doubt that there is no way to use DDP acceleration with the current form of X.real,X.imag data, because the NCCL backend does not support pytorch's complex arithmetic. I don't know if pytorchlighting supports |
Yes my use of maxpooling is inspired by the channel attention modules in https://arxiv.org/abs/2102.01993, which themselves are inspired by CBAM. About the DDP, I ran into this issue of not being able to use DDP with complex tensors; I actually created an issue in NCCL's repo. I considered using torch.view_as_real throughout the repo to be able to use DDP, but didn't really seem worth it. That diagram is correct, aside from the input is 256x256 to account for variable sample length. |
Hello author, in the config.py file, is the hop_lenth 256 instead of 32?
The text was updated successfully, but these errors were encountered: