You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, there is an error when I tried to run Inference.py. I think there is just a dimension problem.
However, I didn't understand line 46 and line 49 in Inference.py since the audio_file.shape[0] of the wav file will be "1".
And then you do zero padding if "audio_file.shape[0] < (SAMPLE_RATE * set_length)".
I cannot understand what you are handling with this part. Can you explain it?
The bug I see is as the following.
root@c4f3aefb5d65:/workspace/Prefix_AAC_ICASSP2023# python3 Inference.py 2 1 ./AudioCaps/test/c3nlaAkv9bA.wav
/opt/conda/lib/python3.10/site-packages/torchvision/datapoints/init.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchvision/transforms/v2/init.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchlibrosa/stft.py:686: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
self.melW = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels,
use GPT2 Tokenizer
temporal feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 15
global feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 11
Encoder freezing
GPT2 freezing
header trainable!
Traceback (most recent call last):
File "/workspace/Prefix_AAC_ICASSP2023/Inference.py", line 63, in
audio_file = torch.cat((audio_file, pad_val), dim=0)
RuntimeError: Tensors must have same number of dimensions: got 2 and 1
The text was updated successfully, but these errors were encountered:
Hi, there is an error when I tried to run Inference.py. I think there is just a dimension problem.
However, I didn't understand line 46 and line 49 in Inference.py since the audio_file.shape[0] of the wav file will be "1".
And then you do zero padding if "audio_file.shape[0] < (SAMPLE_RATE * set_length)".
I cannot understand what you are handling with this part. Can you explain it?
The bug I see is as the following.
root@c4f3aefb5d65:/workspace/Prefix_AAC_ICASSP2023# python3 Inference.py 2 1 ./AudioCaps/test/c3nlaAkv9bA.wav
/opt/conda/lib/python3.10/site-packages/torchvision/datapoints/init.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchvision/transforms/v2/init.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/opt/conda/lib/python3.10/site-packages/torchlibrosa/stft.py:686: UserWarning: Empty filters detected in mel frequency basis. Some channels will produce empty responses. Try increasing your sampling rate (and fmax) or reducing n_mels.
self.melW = librosa.filters.mel(sr=sr, n_fft=n_fft, n_mels=n_mels,
use GPT2 Tokenizer
temporal feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 15
global feature ver's mapping network : num_head = 8 num_layers = 4 prefix_vector_length = 11
Encoder freezing
GPT2 freezing
header trainable!
Traceback (most recent call last):
File "/workspace/Prefix_AAC_ICASSP2023/Inference.py", line 63, in
audio_file = torch.cat((audio_file, pad_val), dim=0)
RuntimeError: Tensors must have same number of dimensions: got 2 and 1
The text was updated successfully, but these errors were encountered: