AudioCaps

Description

There are 4 columns in the csv file.

audiocap_id: The id unique to the audio clips and its corresponding caption.
youtube_id: The youtube clip that the audio belongs to. You can use this to obtain the VGGish embedding from AudioSet.
start_time: The start time of the clip.
caption: The audio caption.

Statistics:

Split	Count
Train	49,838
Validation	495
Test	975
Total	51,308

Last edit: May 30, 2019