PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
_{The official implementation of PeriodWave and PeriodWave-Turbo}

| ||Demo page|Demo Page (Turbo)

Sang-Hoon Lee^1,2, Ha-Yeong Choi³, Seong-Whan Lee⁴

¹ Department of Software and Computer Engineering, Ajou University, Suwon, Korea
² Department of Artificial Intelligence, Ajou University, Suwon, Korea
³ AI Tech Lab, KT Corp., Seoul, Korea
⁴ Department of Artificial Intelligence, Korea University, Seoul, Korea

This repository contains:

🪐 A PyTorch implementation of PeriodWave and PeriodWave-Turbo
⚡️ Pre-trained PeriodWave models trained on LibriTTS (24,000 Hz, 100 bins, hop size of 256)
💥 Pre-trained PeriodWave models trained on LJSpeech (22,050 Hz, 80 bins, hop size of 256)
🛸 A PeriodWave training script

Update

24.08.16

In this repositoy, we provide a new paradigm and architecture of Neural Vocoder that enables notably fast training and achieves SOTA performance. With 10 times fewer training times, we acheived State-of-The-Art Performance on LJSpeech and LibriTTS.

First, Train the PeriodWave with conditional flow matching.

PeriodWave: The first successful conditional flow matching waveform generator that outperforms GAN-based Neural Vocoders

Second, Accelerate the PeriodWave with adversarial flow matching optimzation.

PeriodWave-Turbo: SOTA Few-step Generator tuned from PeriodWave

Todo

PeriodWave (Mel-spectrogram)

PeriodWave (Trained with LJSpeech, 22.05 kHz, 80 bins)
PeriodWave (Trained with LibriTTS-train-960, 24 kHz, 100 bins)
Training Code
Inference
PeriodWave with FreeU (Only Inference)
Evaluation (M-STFT, PESQ, Periodicity, V/UV F1, Pitch, UTMOS)
PeriodWave-Small (Trained with LibriTTS-train-960, 24 kHz, 100 bins)
PeriodWave-Large (Trained with LibriTTS-train-960, 24 kHz, 100 bins)

PeriodWave-Turbo (Mel-spectrogram)

Paper (PeriodWave-Turbo paper was released, https://arxiv.org/abs/2408.08019.)
PeriodWave-Turbo (4 Steps ODE, Euler Method)
PeriodWave-Turbo-Small (4 Steps ODE, Euler Method)
PeriodWave-Turbo-Large (4 Steps ODE, Euler Method)

We have compared several methods including different reconstuction losses, distillation methods, and GANs for PeriodWave-Turbo. Finetuning the PeriodWave models with fixed steps could significantly improve the performance! The PeriodWave-Turbo utilized the Multi-scale Mel-spectrogram loss and Adversarial Training (MPD, CQT-D) following BigVGAN-v2. We highly appreciate the authors of BigVGAN for their dedication to the open-source implementation. Thanks to their efforts, we were able to quickly experiment and reduce trial and error.

PeriodWave-Turbo (EnCodec 24 kHz)

PeriodWave-Turbo (2 Steps ODE, Euler Method)
PeriodWave-Turbo (4 Steps ODE, Euler Method)

We will update the PeriodWave-Turbo Paper soon, and release the PeriodWave-Turbo models that generate waveform from EnCodec Tokens. While we trained these models with EnCodec Tokens of Q=8, we found that our model has shown robust and powerful performance on any bitrates of 1.5 (Q=2), 3 (Q=4), 6 (Q=8), 12 (Q=16), and 24 (Q=32).

TTS with PeriodWave

PeriodWave with TTS (24 kHz, 100 bins)

The era of Mel-spectrograms is returning with advancements in models like P-Flow, VoiceBox, E2-TTS, DiTTo-TTS, ARDiT-TTS, and MELLE. PeriodWave can enhance the audio quality of your TTS models, eliminating the need to rely on codec models. Mel-spectrogram with powerful generative models has the potential to surpass neural codec language models in performance.

Getting Started

Pre-requisites

Pytorch >=1.13 and torchaudio >= 0.13
Install requirements

pip install -r requirements.txt

Prepare Dataset

Prepare your own Dataset (We utilized LibriTTS dataset without any preprocessing)
Extract Energy Min/Max

python extract_energy.py

Change energy_max, energy_min in Config.json

Train PeriodWave

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_periodwave.py -c configs/periodwave.json -m periodwave

Train PeriodWave-Turbo

Finetuning the PeriodWave with fixed steps can improve the entire performance and accelerate the inference speed (NFE 32 --> 2 or 4)

CUDA_VISIBLE_DEVICES=0,1,2,3 python train_periodwave_turbo.py -c configs/periodwave_turbo.json -m periodwave_turbo

Inference PeriodWave (24 kHz)

# PeriodWave
CUDA_VISIBLE_DEVICES=0 python inference.py --ckpt "logs/periodwave_base_libritts/G_1000000.pth" --iter 16 --noise_scale 0.667 --solver 'midpoint'

# PeriodWave with FreeU (--s_w 0.9 --b_w 1.1)
# Decreasing skip features could reduce the high-frequency noise of generated samples
# We only recommend using FreeU with PeriodWave. Note that PeriodWave-Turbe with FreeU has different aspects so we do not use FreeU with PeriodWave-Turbo. 
CUDA_VISIBLE_DEVICES=0 python inference_with_FreeU.py --ckpt "logs/periodwave_libritts/G_1000000.pth" --iter 16 --noise_scale 0.667 --solver 'midpoint' --s_w 0.9 --b_w 1.1

# PeriodWave-Turbo-4steps (Highly Recommended)
CUDA_VISIBLE_DEVICES=0 python inference.py --ckpt "logs/periodwave_turbo_base_step4_libritts_24000hz/G_274000.pth" --iter 4 --noise_scale 1 --solver 'euler'

Reference

Flow Matching for high-quality and efficient generative model

FM: https://openreview.net/forum?id=PqvMRDCJT9t
VoiceBox (Mel-spectrogram Generation): https://openreview.net/forum?id=gzCS252hCO&noteId=e2GZZfeO9g
P-Flow (Mel-spectrogram Generation): https://openreview.net/forum?id=zNA7u7wtIN
RF-Wave (Waveform Generation): https://github.com/bfs18/rfwave (After paper submission, we found that the paper RF-Wave also utilized FM for waveform generation. They used it on the complex spectrogram domain for efficient waveform generation. It is cool idea!)

Inspired by the multi-period discriminator of HiFi-GAN, we first distillate the multi-periodic property in generator

HiFi-GAN: https://github.com/jik876/hifi-gan

Prior Distribution

PriorGrad: https://github.com/microsoft/NeuralSpeech/tree/master/PriorGrad-vocoder

Frequency-wise waveform modeling due to the limitation of high-frequency modeling

Fre-GAN 2: https://github.com/prml-lab-speech-team/demo/tree/master/FreGAN2/code
MBD (Multi-band Diffusion): https://github.com/facebookresearch/audiocraft
FreGrad: https://github.com/kaistmm/fregrad

High-efficient temporal modeling

Vocos: https://github.com/gemelo-ai/vocos
ConvNeXt-V2: https://github.com/facebookresearch/ConvNeXt-V2

Large-scale Universal Vocoder

BigVGAN: https://arxiv.org/abs/2206.04658
BigVSAN: https://github.com/sony/bigvsan

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
LICENSE		LICENSE
README.md		README.md
periodwave.png		periodwave.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
_{The official implementation of PeriodWave and PeriodWave-Turbo}

| ||Demo page|Demo Page (Turbo)

Update

24.08.16

Todo

PeriodWave (Mel-spectrogram)

PeriodWave-Turbo (Mel-spectrogram)

PeriodWave-Turbo (EnCodec 24 kHz)

TTS with PeriodWave

Getting Started

Pre-requisites

Prepare Dataset

Train PeriodWave

Train PeriodWave-Turbo

Inference PeriodWave (24 kHz)

Reference

Flow Matching for high-quality and efficient generative model

Inspired by the multi-period discriminator of HiFi-GAN, we first distillate the multi-periodic property in generator

Prior Distribution

Frequency-wise waveform modeling due to the limitation of high-frequency modeling

High-efficient temporal modeling

Large-scale Universal Vocoder

About

Releases

Packages

License

sh-lee-prml/PeriodWave

Folders and files

Latest commit

History

Repository files navigation

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation The official implementation of PeriodWave and PeriodWave-Turbo

| ||Demo page|Demo Page (Turbo)

Update

24.08.16

Todo

PeriodWave (Mel-spectrogram)

PeriodWave-Turbo (Mel-spectrogram)

PeriodWave-Turbo (EnCodec 24 kHz)

TTS with PeriodWave

Getting Started

Pre-requisites

Prepare Dataset

Train PeriodWave

Train PeriodWave-Turbo

Inference PeriodWave (24 kHz)

Reference

Flow Matching for high-quality and efficient generative model

Inspired by the multi-period discriminator of HiFi-GAN, we first distillate the multi-periodic property in generator

Prior Distribution

Frequency-wise waveform modeling due to the limitation of high-frequency modeling

High-efficient temporal modeling

Large-scale Universal Vocoder

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
_{The official implementation of PeriodWave and PeriodWave-Turbo}

Packages