There are 4 columns in the csv file.
- audiocap_id: The id unique to the audio clips and its corresponding caption.
- youtube_id: The youtube clip that the audio belongs to. You can use this to obtain the VGGish embedding from AudioSet.
- start_time: The start time of the clip.
- caption: The audio caption.
Split | Count |
---|---|
Train | 49,838 |
Validation | 495 |
Test | 975 |
Total | 51,308 |
Last edit: May 30, 2019