Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

Conference archive | arXiv | Demo | Presentation video

1. Dependencies

Install required python packages:

pip install -r requirements.txt

2. Quick Start

Download pre-trained model from TsinghuaCloud, GoogleDrive or BaiduNetDisk and put it into My_model/my_demo.

Download Speechsplit pre-trained model (pitch decoder 640000-P.ckpt and vocoder checkpoint_step001000000_ema.pth) from here.

Then cd My_model and modify paths in demo.py to your own paths.

Run python demo.py and you will get the converted audio .wav in /my_demo same like test_result.

You else can choose the conditions in demo.py.

3. Preparation, Training and Inference

Download the VCTK dataset.

Extract spectrogram and f0:make_spect_f0.py.

And modify it to your own path and divide the dataset, run data_split.py.

Generate training metadata: make_metadata.py.

Run the training scripts: main.py.

Generate testing metadata: make_test_metadata.py.

Run the inference scripts: inference.py

4. Evaluation

You may refer to the following: WER.py, mcd.py, f0_pcc.py, draw_f0_distributions.py, draw_speaker_embedding.py

5. Acknowledgement and References

This work is supported by National Natural Science Foundation of China (NSFC) (62076144), National Social Science Foundation of China (NSSF) (13&ZD189) and Shenzhen Key Laboratory of next generation interactive media innovative technology (ZDSYS20210623092001004).

Our work mainly inspired by:

(1) SpeechSplit:

K. Qian, Y. Zhang, S. Chang, M. Hasegawa-Johnson, and D. Cox, “Unsupervised speech decomposition via triple information bottleneck,” in International Conference on Machine Learning. PMLR, 2020, pp. 7836–7846.

(2) VQMIVC:

D. Wang, L. Deng, Y. T. Yeung, X. Chen, X. Liu, and H. Meng, “VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion,” in Interspeech, 2021, pp. 1344–1348.

(3) ClsVC:

H. Tang, X. Zhang, J. Wang, N. Cheng, and J. Xiao, “Clsvc: Learning speech representations with two different classification tasks.” Openreview, 2021, https://openreview.net/forum?id=xp2D-1PtLc5.

6. Citation

If you find our work useful in your research, please consider citing:

@inproceedings{yang22f_interspeech,
  author={SiCheng Yang and Methawee Tantrawenith and Haolin Zhuang and Zhiyong Wu and Aolan Sun and Jianzong Wang and Ning Cheng and Huaizhen Tang and Xintao Zhao and Jie Wang and Helen Meng},
  title={{Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={2553--2557},
  doi={10.21437/Interspeech.2022-571}
}

Some results can be found here. Please feel free to contact us (yangsc21@mails.tsinghua.edu.cn) with any question or concerns.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
My_model		My_model
SpeechSplit		SpeechSplit
VCTK		VCTK
autovc		autovc
README.md		README.md
WER.py		WER.py
data_split.py		data_split.py
f0_pcc.py		f0_pcc.py
mcd.py		mcd.py
overview.png		overview.png
requirement.txt		requirement.txt
video.mp4		video.mp4
wav_cat.py		wav_cat.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

Conference archive | arXiv | Demo | Presentation video

1. Dependencies

2. Quick Start

3. Preparation, Training and Inference

4. Evaluation

5. Acknowledgement and References

6. Citation

About

Releases

Packages

Languages

YoungSeng/SRD-VC

Folders and files

Latest commit

History

Repository files navigation

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

Conference archive | arXiv | Demo | Presentation video

1. Dependencies

2. Quick Start

3. Preparation, Training and Inference

4. Evaluation

5. Acknowledgement and References

6. Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages