UTMOSv2: UTokyo-SaruLab MOS Prediction System

🎤✨ Official implementation of ✨🎤
“The T05 System for The VoiceMOS Challenge 2024:
Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech”
🏅🎉 accepted by IEEE Spoken Language Technology Workshop (SLT) 2024. 🎉🏅

ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ･-･ꔫ

✨ UTMOSv2 achieved 1st place in 7 out of 16 metrics ✨
✨🏆 and 2nd place in the remaining 9 metrics 🏆✨
✨ in the VoiceMOS Challenge 2024 Track1! ✨

🚀 Quick Prediction

✨ You can easily use the pretrained UTMOSv2 model!

🛠️ Using in your Python code 🛠️

✨⚡️ With the UTMOSv2 library, you can easily integrate it into your Python code, ⚡️✨
✨ allowing you to quickly create models and make predictions with minimal effort!! ✨

If you want to make predictions using the UTMOSv2 library, follow these steps:

Install the UTMOSv2 library from GitHub

pip install git+https://github.com/sarulab-speech/UTMOSv2.git

Make predictions

To predict the MOS of a single .wav file:

import utmosv2
model = utmosv2.create_model(pretrained=True)
mos = model.predict(input_path="/path/to/wav/file.wav")

To predict the MOS of all .wav files in a folder:

import utmosv2
model = utmosv2.create_model(pretrained=True)
mos = model.predict(input_dir="/path/to/wav/dir/")

Note

Either input_path or input_dir must be specified, but not both.

📜 Using the inference script 📜

If you want to make predictions using the inference script, follow these steps:

Clone this repository and navigate to UTMOSv2 folder

git clone https://github.com/sarulab-speech/UTMOSv2.git
cd UTMOSv2

Install Package

pip install --upgrade pip  # enable PEP 660 support
pip install -e .[optional] # install with optional dependencies

Make predictions

To predict the MOS of a single .wav file:

python inference.py --input_path /path/to/wav/file.wav --out_path /path/to/output/file.csv

To predict the MOS of all .wav files in a folder:

python inference.py --input_dir /path/to/wav/dir/ --out_path /path/to/output/file.csv

Note

If you are using zsh, make sure to escape the square brackets like this:

pip install -e '.[optional]'

Tip

If --out_path is not specified, the prediction results will be output to the standard output. This is particularly useful when the number of files to be predicted is small.

Note

Either --input_path or --input_dir must be specified, but not both.

Note

These methods provide quick and simple predictions. For more accurate predictions and detailed usage of the inference script, please refer to the inference guide.

🤗 You can try a simple demonstration on Hugging Face Space:

⚒️ Train UTMOSv2 Yourself

If you want to train UTMOSv2 yourself, please refer to the training guide. To reproduce the training as described in the paper or used in the competition, please refer to this document.

📂 Used Datasets

Details of the datasets used in this project can be found in the datasets documentation.

🔖 Citation

If you find UTMOSv2 useful in your research, please cite the following paper:

@inproceedings{baba2024utmosv2,
  title     = {The T05 System for The {V}oice{MOS} {C}hallenge 2024: Transfer Learning from Deep Image Classifier to Naturalness {MOS} Prediction of High-Quality Synthetic Speech},
  author    = {Baba, Kaito and Nakata, Wataru and Saito, Yuki and Saruwatari, Hiroshi},
  booktitle = {IEEE Spoken Language Technology Workshop (SLT)},
  year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
.vscode		.vscode
docs		docs
tests/core_tests		tests/core_tests
utmosv2		utmosv2
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
poster.pdf		poster.pdf
pyproject.toml		pyproject.toml
quickstart.ipynb		quickstart.ipynb
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UTMOSv2: UTokyo-SaruLab MOS Prediction System

🚀 Quick Prediction

🛠️ Using in your Python code 🛠️

📜 Using the inference script 📜

⚒️ Train UTMOSv2 Yourself

📂 Used Datasets

🔖 Citation

About

Releases 4

Packages

Contributors 2

Languages

License

sarulab-speech/UTMOSv2

Folders and files

Latest commit

History

Repository files navigation

UTMOSv2: UTokyo-SaruLab MOS Prediction System

🚀 Quick Prediction

🛠️ Using in your Python code 🛠️

📜 Using the inference script 📜

⚒️ Train UTMOSv2 Yourself

📂 Used Datasets

🔖 Citation

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages