python infer_encodec.py --data_path data/wavs --save_path data/encodec_embeddings
python infer_clap.py --data_path data/wavs --save_path data/clap_embeddings --clap_ckpt clap_ckpt_path
You may give different options to --enable_fusion
and --audio_encoder
flags based on the CLAP checkpoint you are using.
You should make CSV files corresponding to train, validation, and test dataset. Examples are given in csv for AudioCaps and Clotho datasets.
- Train csv files should have columns
file_path
andcaption
. If an audio file is labeled with multiple captions, they should be made listed in separate entries. - Validation and Test csv files should have columns
file_path
,caption_1
,caption_2
,....,caption_n
ifn
captions are given for a single audio file.