Implementation of SPEECH FOUNDATION MODEL ENSEMBLES FOR THE CONTROLLED SINGING VOICE DEEPFAKE DETECTION CTRSVDD CHALLENGE 2024
-
Prepare the Dataset:
- Follow the instructions in the SVDD Challenge CtrSVDD_Utils Repository to manage the dataset.
-
Organize Dataset Structure:
- Ensure your main directory has the following structure:
Datasets/ ├── dev/ ├── train/ ├── eval/ ├── dev.txt └── eval.txt
- Ensure your main directory has the following structure:
-
Set Up the Environment:
- Create a conda environment using the provided
requirements.txt
file:conda create --name your_env_name --file requirements.txt conda activate your_env_name
- Create a conda environment using the provided
-
Run Training:
- Execute the training script by specifying the base directory of the dataset:
python train.py --base_dir {path_to_Datasets_folder}
- Additional arguments can be added, such as
--algo
for the rawboost algorithm:python train.py --base_dir {path_to_Datasets_folder} --algo {algorithm_choice}
- To change the model, modify the model selection directly in the
train.py
script header.
- Execute the training script by specifying the base directory of the dataset:
-
Run Evaluation:
- Execute the evaluation script by specifying the base directory of the dataset:
python eval.py --base_dir {path_to_Datasets_folder}
- Execute the evaluation script by specifying the base directory of the dataset:
-
Custom Arguments:
- You can customize various parameters through command-line arguments as needed.
- Example:
python train.py --base_dir {path_to_Datasets_folder} --batch_size 64 --epochs 50
-
Changing the Model:
- To use a different model, edit the model import and instantiation in the
train.py
file.
- To use a different model, edit the model import and instantiation in the
For further details, refer to the code comments within the scripts.
If you would like to use or reference this work in your own research or project, please cite it as follows:
@article{guragain2024speech,
title={Speech Foundation Model Ensembles for the Controlled Singing Voice Deepfake Detection (CtrSVDD) Challenge 2024},
author={Guragain, Anmol and Liu, Tianchi and Pan, Zihan and Sailor, Hardik B and Wang, Qiongqiong},
journal={arXiv preprint arXiv:2409.02302},
year={2024}
}