[New-Model] HiFi-GAN implementation #661

erogol · 2021-02-19T15:14:34Z

@rishikksh20 is kind to integrate his own work into TTS.

for more details: https://github.com/rishikksh20/HiFi-GAN

thorstenMueller · 2021-02-20T11:46:05Z

Great @rishikksh20 👍 . Looking forward to it, because i'm interested in training HifiGAN for my (Mozilla) Tacotron2 DCA trained model. If it's helpful you could use my public german dataset for testing.

rishikksh20 · 2021-02-20T21:52:54Z

@erogol @thorstenMueller Sure, I check on German dataset.
But I like to share some points regarding my implementation of HiFiGAN, I implemented this repo https://github.com/rishikksh20/HiFi-GAN/tree/d044dbcdf799f0fdfbfc1920e57e95ac6a05f91b , just after reading the hifigan's paper and I never gone through original hifigan repo while coding my implementation, now when I compare it with official hifigan repo I noticed that my implementation is little bit diff than official implementation.

And I guess, I did something terrible right because my implementation is trained 30% faster (1.9 steps/sec vs 1.4 steps/sec of official repo on V100, batch 16), 3x smaller (aprrox 350 MB vs 920 MB) than original not only that, it converges really really fast, I only trained my model for 12 hrs (80k steps) and it's quality is better than official repo's 1 week (1 million steps) of training samples on V100 and that too without fine tuning with GTA and deep feature matching loss. I checked this hypothesis on 3 different datasets and results were same.

You can listen to yourself
Original: https://soundcloud.com/rishikesh-kumar-1/original
Generated : https://soundcloud.com/rishikesh-kumar-1/generated

I still training my repo on different datasets, I do modify my code bit which make quality worse. So far this commit tree https://github.com/rishikksh20/HiFi-GAN/tree/d044dbcdf799f0fdfbfc1920e57e95ac6a05f91b gives best quality and I will integrate this with TTS's repo.

m-toman · 2021-02-22T08:32:42Z

@rishikksh20 sounds great, did you also try the "V2" version? So setting this https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/config_v1.json#L13 to 128, as far as I see that's the only difference.

I've been training the official HifiGAN repo for ages on one GPU but never really got close to the official models and definitely worse than my current MelGAN setup. Think on one 11GB GPU I'd probalby have to train it 2 months :)

rishikksh20 · 2021-02-22T11:25:03Z

@m-toman yes HifiGAN is too slow to train, although I think after 1.5 M steps (12 days on V100) of training quality more or less similar in V1 version, still 12 days on V100 is quite huge time. I tried V2 version of official HifiGAN repo but convergence time is somewhat similar but quality is much more worse similar case for V3 because V1, V2 and V3 all share same discs and discs of hifigan is too slow to train.

m-toman · 2021-02-22T12:49:06Z

Thanks, I meant if you tried some V2 style setting with your implementation. V1 seems to be much slower than V2 on CPU

nukes · 2021-02-24T03:26:54Z

@erogol @thorstenMueller Sure, I check on German dataset.
But I like to share some points regarding my implementation of HiFiGAN, I implemented this repo https://github.com/rishikksh20/HiFi-GAN/tree/d044dbcdf799f0fdfbfc1920e57e95ac6a05f91b , just after reading the hifigan's paper and I never gone through original hifigan repo while coding my implementation, now when I compare it with official hifigan repo I noticed that my implementation is little bit diff than official implementation.

And I guess, I did something terrible right because my implementation is trained 30% faster (1.9 steps/sec vs 1.4 steps/sec of official repo on V100, batch 16), 3x smaller (aprrox 350 MB vs 920 MB) than original not only that, it converges really really fast, I only trained my model for 12 hrs (80k steps) and it's quality is better than official repo's 1 week (1 million steps) of training samples on V100 and that too without fine tuning with GTA and deep feature matching loss. I checked this hypothesis on 3 different datasets and results were same.

You can listen to yourself
Original: https://soundcloud.com/rishikesh-kumar-1/original
Generated : https://soundcloud.com/rishikesh-kumar-1/generated

I still training my repo on different datasets, I do modify my code bit which make quality worse. So far this commit tree https://github.com/rishikksh20/HiFi-GAN/tree/d044dbcdf799f0fdfbfc1920e57e95ac6a05f91b gives best quality and I will integrate this with TTS's repo.

Interesting! Why your model is much smaller than the official one? you have smaller discriminators? The official model is indeed very large.

ghost · 2021-02-25T12:53:44Z

Which voice corpus is listen here?
Generated : https://soundcloud.com/rishikesh-kumar-1/generated

rishikksh20 · 2021-02-25T16:40:06Z

My custom dataset

erogol · 2021-03-15T10:40:41Z

Continues there coqui-ai/TTS#16

erogol added the new-model label Feb 19, 2021

erogol mentioned this issue Feb 19, 2021

[Suggestion] HiFiGAN implementation in TTS #562

Closed

erogol closed this as completed Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New-Model] HiFi-GAN implementation #661

[New-Model] HiFi-GAN implementation #661

erogol commented Feb 19, 2021

thorstenMueller commented Feb 20, 2021

rishikksh20 commented Feb 20, 2021 •

edited

Loading

m-toman commented Feb 22, 2021

rishikksh20 commented Feb 22, 2021

m-toman commented Feb 22, 2021

nukes commented Feb 24, 2021

ghost commented Feb 25, 2021

rishikksh20 commented Feb 25, 2021

erogol commented Mar 15, 2021

[New-Model] HiFi-GAN implementation #661

[New-Model] HiFi-GAN implementation #661

Comments

erogol commented Feb 19, 2021

thorstenMueller commented Feb 20, 2021

rishikksh20 commented Feb 20, 2021 • edited Loading

m-toman commented Feb 22, 2021

rishikksh20 commented Feb 22, 2021

m-toman commented Feb 22, 2021

nukes commented Feb 24, 2021

ghost commented Feb 25, 2021

rishikksh20 commented Feb 25, 2021

erogol commented Mar 15, 2021

rishikksh20 commented Feb 20, 2021 •

edited

Loading