You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I am training the HTSAT-BART captioning model on AudioCaps only as a baseline. The metrics almost matches those reported in the paper (Spider: 44.1, Cider: 71.1) on the validation set. However, these numbers are achieved on epoch 6 and toward epoch 20, the metrics drops significantly to (spider: 0.36, Cider: 0.57), and they bounces a lot throughout the training. Is this an expected behaviour? If by any change you have access to the output file of HTSAT-BART (baseline) on AudioCaps, would you mind sharing it?
Thank you!
The text was updated successfully, but these errors were encountered:
Since both the audio encoder and the text decoder are pretrained models, the system tends to overfit quickly after fewer than 10 epochs. Additionally, the reported scores are based on the test set, not the validation set. Please find the attached output log for more details.
Hello,
I am training the HTSAT-BART captioning model on AudioCaps only as a baseline. The metrics almost matches those reported in the paper (Spider: 44.1, Cider: 71.1) on the validation set. However, these numbers are achieved on epoch 6 and toward epoch 20, the metrics drops significantly to (spider: 0.36, Cider: 0.57), and they bounces a lot throughout the training. Is this an expected behaviour? If by any change you have access to the output file of HTSAT-BART (baseline) on AudioCaps, would you mind sharing it?
Thank you!
The text was updated successfully, but these errors were encountered: