Fine-tuning for hindi #525

hetpandya · 2020-09-11T06:50:07Z

Hi @blue-fish , I am trying to fine-tune the model to clone voices of hindi speakers. I wanted to know the steps to follow for the same and also the amount of data I'd need for the model to work well.

Edit - I shall use google colab for fine-tuning

ghost · 2020-09-11T08:17:05Z

Hi @thehetpandya , please start by reading this: #431 (comment)

It is not possible to finetune the English model to another language. A new model needs to be trained from scratch. This is because the model relates the letters of the alphabet to their associated sounds, so what the model knows about English does not transfer over to Hindi. At a minimum, you will need a total of 50 hours of transcribed speech from at least 100 speakers. For a better model get 10 times this number.

This is what you need to do. Good luck and have fun!

Replicate the training of the English synthesizer to learn how to use the data processing and training scripts.
- I have no idea how to do this with Google colab, but it should be possible.
Assemble and preprocess your dataset
Train a synthesizer model
Troubleshoot problems with the model.
Repeat steps 3 and 4 until satisfied

hetpandya · 2020-09-11T09:00:59Z

@blue-fish Thanks a lot for the response! Yes, I have begun exploring the issues for now for a better understanding of the workflow before beginning with the training process.

I also read on #492 (comment) that beginning with training the synthesizer would be a good start and then only if the encoder doesn't seem to give proper results, one can proceed with training/fine-tuning the encoder. Does that same apply in case for a totally different language too, like in my case i.e. hindi?

ghost · 2020-09-11T15:45:48Z

I agree with that suggestion. Encoder training requires a lot of data, time and effort. You can see #126 and #458 to get an idea. If your results are good enough without it, best to avoid that hassle.

lawrence124 · 2020-09-12T15:05:53Z

@thehetpandya

I'm working on a forked version of sv2tts to train local dialect of chinese. Using the dataset from Common Voice (about 22k of utterances) , i couldn't get the data to converge. But if I add the local dialect on top of a pre-trained model (the main dialect of chinese), seems like the result is actually quite good. Fyi, the local dialect and the main dialect have different, but similar alphabet romanization system (for example, the main has 4 tones, but the local dialect has 8)

using common voice data only:

using pre-trained and then add local dataset:

@blue-fish
not sure if i'm abusing the model, but at least it works :)

ghost · 2020-09-12T22:28:25Z

@lawrence124 Interesting, thanks for sharing that result! Occasionally the model fails to learn attention, you might try restarting the training from scratch with a different random seed. It might also help to trim the starting and ending silences. If your data is at 16 kHz then webrtcvad can do that for you (see the trim_long_silences function in encoder/audio.py).

hetpandya · 2020-09-14T04:57:45Z

Thanks @blue-fish I went through the issues you mentioned. You gave me a good amount of resources for a start. Much appreciated!

hetpandya · 2020-09-14T05:00:28Z

@lawrence124 Glad to see your results! Did you have to train the encoder from scratch? Or using the pre-trained decoder/synthesizer worked for you?

lawrence124 · 2020-09-14T07:49:08Z

i'm using the pretrained encoder from Kuangdd, but according to the file size and date...seems like it is the same as the pretrained encoder from here

hetpandya · 2020-09-14T08:05:36Z

Okay, thanks @lawrence124 ! Seems like using the pretrained encoder is good to go for now.

lawrence124 · 2020-09-14T14:44:56Z

btw, i modified a script from adueck a bit. This script will convert video/audio with srt to audio with script for training. I'm not quite sure the format for sv2tts though, but i think u may find it useful if u are trying to get some more date set to train on.

https://github.com/adueck/split-video-by-srt

srt-split.zip

lawrence124 · 2020-09-15T07:59:28Z

@blue-fish

would like to ask a rather random question...have u tried using the demo TTS from https://www.readspeaker.com/ ??

from my point of view, the result in Chinese/Cantonese is pretty good and i would like to discuss...is that their proprietary algorithm is simply superior ?? or they simply has the resources to build a better dataset to train on ??

based on the job description, what they are doing is not too different from tacotron / sv2tts

https://www.isca-speech.org/iscapad/iscapad.php?module=article&id=17363&back=p,250

ghost · 2020-09-15T11:06:00Z

@lawrence124 That website demo uses an different algorithm that probably does not involve machine learning. It sounds like a concatenative method of synthesis where prerecorded sounds are joined together. Listening closely, it is unnatural and obviously computer-generated. To their credit, they do use high-quality audio samples to build the output.

Here's a wav of the demo text synthesized by zhrtvc, using Griffin-Lim as the vocoder. Tacotron speech flows a lot more smoothly than their demo. zhrtvc could sound better than the demo TTS if 1) it is trained on higher quality audio, and 2) a properly configured vocoder is available.

lawrence124 · 2020-09-15T13:00:23Z

@blue-fish yea, as with other data analysis....getting the good/clean dataset is always difficult. (the prelim result of adding youtube clips is not good)

20200915-204053_melgan_10240ms.zip

This is an example of using "mandarin + cantonese" as synthesizer, along with Melgan vocoder. I dont know if it is my ear or not, i dont really like the Griffin-Lim from zhrtvc, it has the "robotic" noise in the background.

btw, seems like u are updating the synthesizer of sv2tts ?? the backbone is still tacotron ??

hetpandya · 2020-09-16T07:56:01Z

btw, i modified a script from adueck a bit. This script will convert video/audio with srt to audio with script for training. I'm not quite sure the format for sv2tts though, but i think u may find it useful if u are trying to get some more date set to train on.

https://github.com/adueck/split-video-by-srt

srt-split.zip

@lawrence124 thanks I shall take a look at it since I might need more data if I cannot find any public dataset

GauriDhande · 2020-09-18T09:00:40Z

@thehetpandya were you able to generate the model for cloning hindi sentences?

hetpandya · 2020-09-18T12:48:41Z

@GauriDhande I'm still looking for a good hindi speech dataset. Do you have any sources?

GauriDhande · 2020-09-22T07:09:14Z

Was going to ask the same thing. I didn't find the Hindi open speech dataset on the internet yet.

ghost · 2020-09-23T15:10:34Z

You might be able the combine the two sources below. First train a single-speaker model on source 1, then tune the voice cloning aspect on source 2. Some effort and experimentation will be required.

Source 1 (24 hours single-speaker): https://cvit.iiit.ac.in/research/projects/cvit-projects/text-to-speech-dataset-for-indian-languages
Source 2 (100 voices, 6 utterances each, untranscribed): https://github.com/shivam-shukla/Speech-Dataset-in-Hindi-Language

hetpandya · 2020-09-24T10:32:35Z

Thanks @blue-fish, I've already applied for Source 1. Will also check out the second one. Your efforts on this project are much appreciated!

ghost · 2020-10-06T21:34:15Z

Hi @thehetpandya , have you made any progress on this recently?

hetpandya · 2020-10-10T14:54:51Z

Hi @blue-fish , no I coudn't find progress on this one. I tried fine-tuning https://github.com/Kyubyong/dc_tts instead, which gave clearer pronunciation of hindi words.
Edit - I tried fine-tuning https://github.com/Kyubyong/dc_tts on Source 1 i.e. https://cvit.iiit.ac.in/research/projects/cvit-projects/text-to-speech-dataset-for-indian-languages

ghost · 2020-10-12T08:45:37Z

Thanks for trying @thehetpandya . If you decide to work on this later please reopen the issue and I'll try to help.

amrahsmaytas · 2020-12-10T08:26:01Z

Greetings @thehetpandya

Are you able to do a real time voice cloning for the given english text, with your experiment in indian accent?

Could you please help/guide me with Voice cloning of english Text In My voice with indian accent?

Thanks

hetpandya · 2020-12-10T12:58:01Z

Hi @amrahsmaytas no I couldn't land to good results. And then I had to shift to another task. Still I'd be glad if I cloud be of any help.

amrahsmaytas · 2020-12-10T13:08:20Z

Hi @amrahsmaytas no I couldn't land to good results. And then I had to shift to another task. Still I'd be glad if I cloud be of any help.

Thanks for the reply,het!
I need your help in training, could you please check your mail (send from greetsatyamsharma@gmail.com) and connect me over there for further discussions!

Thanks ✌,
Awaiting for your response, dude
Satyam.

rajuc110 · 2021-05-13T05:19:18Z

@GauriDhande and @thehetpandya were you guys able to generate the model for cloning Hindi sentences? Please reply.

Thanks.

hetpandya · 2021-06-07T13:08:17Z

Hi @rajuc110 , sorry for the delayed response. No, I couldn't reproduce the results in hindi and had to shift to another task meanwhile.

SayaliNagwkar17 · 2022-02-14T07:56:16Z

Hi @blue-fish , no I coudn't find progress on this one. I tried fine-tuning https://github.com/Kyubyong/dc_tts instead, which gave clearer pronunciation of hindi words. Edit - I tried fine-tuning https://github.com/Kyubyong/dc_tts on Source 1 i.e. https://cvit.iiit.ac.in/research/projects/cvit-projects/text-to-speech-dataset-for-indian-languages

Can you share your work?

SohumKaliaCoder · 2023-03-30T07:19:51Z

I am also facing this issue has anyone have update on this issue

divyendrajadoun · 2023-04-10T14:06:32Z

Hey guys, has anyone found a solution for hindi voice cloning? Thanks

Harsh-Holy9 · 2024-04-15T09:39:37Z

Anybody has already trained model for Hindi language?

Chetan-5ehgal · 2024-05-13T13:01:08Z

any progress done on training real time voice cloning on hindi data set ?

ghost closed this as completed Oct 12, 2020

ghost mentioned this issue Oct 8, 2021

Support for other languages #30

Open

Chetan-5ehgal mentioned this issue May 13, 2024

any progress done on training real time voice cloning on hindi data set ? #1297

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tuning for hindi #525

Fine-tuning for hindi #525

hetpandya commented Sep 11, 2020 •

edited

Loading

ghost commented Sep 11, 2020

hetpandya commented Sep 11, 2020 •

edited

Loading

ghost commented Sep 11, 2020

lawrence124 commented Sep 12, 2020

ghost commented Sep 12, 2020

hetpandya commented Sep 14, 2020 •

edited

Loading

hetpandya commented Sep 14, 2020

lawrence124 commented Sep 14, 2020 •

edited

Loading

hetpandya commented Sep 14, 2020

lawrence124 commented Sep 14, 2020 •

edited

Loading

lawrence124 commented Sep 15, 2020

ghost commented Sep 15, 2020

lawrence124 commented Sep 15, 2020 •

edited

Loading

hetpandya commented Sep 16, 2020

GauriDhande commented Sep 18, 2020

hetpandya commented Sep 18, 2020

GauriDhande commented Sep 22, 2020

ghost commented Sep 23, 2020

hetpandya commented Sep 24, 2020

ghost commented Oct 6, 2020

hetpandya commented Oct 10, 2020 •

edited

Loading

ghost commented Oct 12, 2020

amrahsmaytas commented Dec 10, 2020

hetpandya commented Dec 10, 2020

amrahsmaytas commented Dec 10, 2020

rajuc110 commented May 13, 2021

hetpandya commented Jun 7, 2021

SayaliNagwkar17 commented Feb 14, 2022

SohumKaliaCoder commented Mar 30, 2023

divyendrajadoun commented Apr 10, 2023

Harsh-Holy9 commented Apr 15, 2024

Chetan-5ehgal commented May 13, 2024

Fine-tuning for hindi #525

Fine-tuning for hindi #525

Comments

hetpandya commented Sep 11, 2020 • edited Loading

ghost commented Sep 11, 2020

hetpandya commented Sep 11, 2020 • edited Loading

ghost commented Sep 11, 2020

lawrence124 commented Sep 12, 2020

ghost commented Sep 12, 2020

hetpandya commented Sep 14, 2020 • edited Loading

hetpandya commented Sep 14, 2020

lawrence124 commented Sep 14, 2020 • edited Loading

hetpandya commented Sep 14, 2020

lawrence124 commented Sep 14, 2020 • edited Loading

lawrence124 commented Sep 15, 2020

ghost commented Sep 15, 2020

lawrence124 commented Sep 15, 2020 • edited Loading

hetpandya commented Sep 16, 2020

GauriDhande commented Sep 18, 2020

hetpandya commented Sep 18, 2020

GauriDhande commented Sep 22, 2020

ghost commented Sep 23, 2020

hetpandya commented Sep 24, 2020

ghost commented Oct 6, 2020

hetpandya commented Oct 10, 2020 • edited Loading

ghost commented Oct 12, 2020

amrahsmaytas commented Dec 10, 2020

hetpandya commented Dec 10, 2020

amrahsmaytas commented Dec 10, 2020

rajuc110 commented May 13, 2021

hetpandya commented Jun 7, 2021

SayaliNagwkar17 commented Feb 14, 2022

SohumKaliaCoder commented Mar 30, 2023

divyendrajadoun commented Apr 10, 2023

Harsh-Holy9 commented Apr 15, 2024

Chetan-5ehgal commented May 13, 2024

hetpandya commented Sep 11, 2020 •

edited

Loading

hetpandya commented Sep 11, 2020 •

edited

Loading

hetpandya commented Sep 14, 2020 •

edited

Loading

lawrence124 commented Sep 14, 2020 •

edited

Loading

lawrence124 commented Sep 14, 2020 •

edited

Loading

lawrence124 commented Sep 15, 2020 •

edited

Loading

hetpandya commented Oct 10, 2020 •

edited

Loading