How i can train my audio files .to use indian assent . #429

ash1407 · 2020-07-18T08:14:10Z

How i can train my audio files AS data for ecoder,vocoder .to use indian assent . assent of indian is different so i do not feel having my own voice when i listen it .

ghost · 2020-07-18T17:29:06Z

This is not an easy undertaking so before you start, make sure you satisfy the prerequisites. You must be able to answer "yes" to all questions below:

Does your computer have a NVIDIA GPU?
Do you have coding experience?
Are you willing to devote at least 20 hours to the task?

I have not gone through the process myself, but I'll try to outline it since we don't have a good explanation. What you need to do is to fine-tune the pretrained synthesizer and vocoder models on a suitable dataset.

Find a suitable dataset. Freely available resources include AccentDB (Indian accent) and VCTK (other English accents). For best results on your own voice, record your own dataset though this will take many hours.
Follow the steps in README.md to enable GPU support.
Go to the training wiki page and follow the steps for the synthesizer and vocoder training on the LibriSpeech dataset.
- Review the preprocessing code and understand what it is doing.
- Understand the format of the files in the <datasets_root>/SV2TTS folder
Preprocess your dataset from step 1 to generate training data for the synthesizer.
- At a minimum, this requires editing the preprocessing scripts.
- You will likely need to write your own code to process the data into a suitable format for the toolbox.
- We do not have a tutorial for this. You are on your own here!
Continue training the pretrained synthesizer model on your dataset until it has converged.
Using your new synthesizer model, preprocess your dataset to generate training data for the vocoder.
Continue training the pretrained vocoder model on your dataset until the output is satisfactory.

With luck, your trained models will now generalize to your voice and impart the desired accent. There are no guarantees this will work.

If you succeed, please share your models and I will add them to the list in #400.

ash1407 · 2020-07-18T20:59:23Z

This is not an easy undertaking so before you start, make sure you satisfy the prerequisites. You must be able to answer "yes" to all questions below:

Does your computer have a NVIDIA GPU?

Do you have coding experience?

Are you willing to devote at least 20 hours to the task?

I have not gone through the process myself, but I'll try to outline it since we don't have a good explanation. What you need to do is to fine-tune the pretrained synthesizer and vocoder models on a suitable dataset.

Find a suitable dataset. Freely available resources include AccentDB (Indian accent) and VCTK (other English accents). For best results on your own voice, record your own dataset though this will take many hours.

Follow the steps in README.md to enable GPU support.

Go to the training wiki page and follow the steps for the synthesizer and vocoder training on the LibriSpeech dataset.

Review the preprocessing code and understand what it is doing.

Understand the format of the files in the <datasets_root>/SV2TTS folder

Preprocess your dataset from step 1 to generate training data for the synthesizer.

At a minimum, this requires editing the preprocessing scripts.

You will likely need to write your own code to process the data into a suitable format for the toolbox.

We do not have a tutorial for this. You are on your own here!

Continue training the pretrained synthesizer model on your dataset until it has converged.

Using your new synthesizer model, preprocess your dataset to generate training data for the vocoder.

Continue training the pretrained vocoder model on your dataset until the output is satisfactory.

With luck, your trained models will now generalize to your voice and impart the desired accent. There are no guarantees this will work.

If you succeed, please share your models and I will add them to the list in #400.

I will give a try. Thanks for the guidance friend.

ghost · 2020-07-23T17:11:34Z

@ash1407 Are you still trying? When you get to step 4 (synthesizer preprocessing on new dataset), pull the latest master. The #441 changes should make this step a lot easier.

If using AccentDB, will you finetune a single accent or just throw them all into the mix? It would be interesting to find out if this is enough voices to generalize well for cloning. Also see my latest reply in #437 , it is a promising result to see the synthesizer acquire the accent after a small number of steps (with the caveat that I finetuned with data from a single speaker).

ash1407 · 2020-07-23T18:19:21Z

@ash1407 Are you still trying? When you get to step 4 (synthesizer preprocessing on new dataset), pull the latest master. The #441 changes should make this step a lot easier.

If using AccentDB, will you finetune a single accent or just throw them all into the mix? It would be interesting to find out if this is enough voices to generalize well for cloning. Also see my latest reply in #437 , it is a promising result to see the synthesizer acquire the accent after a small number of steps (with the caveat that I finetuned with data from a single speaker).

I was not having Nvidia GPU , any idea which Gpu i should purchase for Machine learning (i have budget of 4oooRS INR)

ghost · 2020-07-23T20:44:10Z

@ash1407 My fine-tuning in #437 is done using CPU only, and the models are converging quickly enough. Do not get a GPU unless you find it to be much too slow.

ghost · 2020-07-26T07:39:43Z

So I've got some good news and bad news.

Bad news first: In Single speaker fine-tuning process and results #437 (comment) I mention trying to add an accent using the VCTK dataset and it does not generalize to all speakers. You need to train a synthesizer from scratch to impart an accent with zero-shot cloning.
Good news: If you only require a single speaker you can finetune a model in a matter of hours on CPU. (You also need to prepare the dataset, with recordings and text file transcripts, and preprocess them.) Here are my latest results and an example to follow: Single speaker fine-tuning process and results #437 (comment)

ghost · 2020-07-29T15:52:32Z

@ash1407 If you're not working on this actively then I'll close the issue for now. Reopen it when you're ready to give it a try.

Vinotha638 · 2022-03-10T10:57:17Z

This is not an easy undertaking so before you start, make sure you satisfy the prerequisites. You must be able to answer "yes" to all questions below:

Does your computer have a NVIDIA GPU?

Do you have coding experience?

Are you willing to devote at least 20 hours to the task?

I have not gone through the process myself, but I'll try to outline it since we don't have a good explanation. What you need to do is to fine-tune the pretrained synthesizer and vocoder models on a suitable dataset.

Find a suitable dataset. Freely available resources include AccentDB (Indian accent) and VCTK (other English accents). For best results on your own voice, record your own dataset though this will take many hours.

Follow the steps in README.md to enable GPU support.

Go to the training wiki page and follow the steps for the synthesizer and vocoder training on the LibriSpeech dataset.

Review the preprocessing code and understand what it is doing.

Understand the format of the files in the <datasets_root>/SV2TTS folder

Preprocess your dataset from step 1 to generate training data for the synthesizer.

At a minimum, this requires editing the preprocessing scripts.

You will likely need to write your own code to process the data into a suitable format for the toolbox.

We do not have a tutorial for this. You are on your own here!

Continue training the pretrained synthesizer model on your dataset until it has converged.

Using your new synthesizer model, preprocess your dataset to generate training data for the vocoder.

Continue training the pretrained vocoder model on your dataset until the output is satisfactory.

With luck, your trained models will now generalize to your voice and impart the desired accent. There are no guarantees this will work.

If you succeed, please share your models and I will add them to the list in #400.

Is anyone got result for training indian assent? please let me know

shah0eer · 2023-06-24T13:24:57Z

This is not an easy undertaking so before you start, make sure you satisfy the prerequisites. You must be able to answer "yes" to all questions below:

* Does your computer have a NVIDIA GPU?

* Do you have coding experience?

* Are you willing to devote at least 20 hours to the task?

I have not gone through the process myself, but I'll try to outline it since we don't have a good explanation. What you need to do is to fine-tune the pretrained synthesizer and vocoder models on a suitable dataset.

1. Find a suitable dataset. Freely available resources include [AccentDB](https://accentdb.org/) (Indian accent) and [VCTK](https://datashare.is.ed.ac.uk/handle/10283/3443) (other English accents). For best results on your own voice, record your own dataset though this will take many hours.

2. Follow the steps in [README.md](https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/master/README.md) to enable GPU support.

3. Go to the [training wiki page](https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training) and follow the steps for the synthesizer and vocoder training on the LibriSpeech dataset.
   
   * Review the preprocessing code and understand what it is doing.
   * Understand the format of the files in the <datasets_root>/SV2TTS folder

4. Preprocess your dataset from step 1 to generate training data for the synthesizer.
   
   * At a minimum, this requires editing the preprocessing scripts.
   * You will likely need to write your own code to process the data into a suitable format for the toolbox.
   * **We do not have a tutorial for this. You are on your own here!**

5. Continue training the pretrained synthesizer model on your dataset until it has converged.

6. Using your new synthesizer model, preprocess your dataset to generate training data for the vocoder.

7. Continue training the pretrained vocoder model on your dataset until the output is satisfactory.

With luck, your trained models will now generalize to your voice and impart the desired accent. There are no guarantees this will work.

If you succeed, please share your models and I will add them to the list in #400.

Hi, I have looked up on your comments and I need to clone my own voice with ascent so I produce it with text. Can you share step by step direction. I also open an issue #1228

ghost mentioned this issue Jul 18, 2020

Does it support diffrent accents? #318

Closed

ghost mentioned this issue Jul 20, 2020

Questions about the toolbox from @mbdash #433

Closed

ghost closed this as completed Jul 29, 2020

ghost mentioned this issue Oct 8, 2021

Support for other languages #30

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How i can train my audio files .to use indian assent . #429

How i can train my audio files .to use indian assent . #429

ash1407 commented Jul 18, 2020

ghost commented Jul 18, 2020

ash1407 commented Jul 18, 2020

ghost commented Jul 23, 2020

ash1407 commented Jul 23, 2020

ghost commented Jul 23, 2020 •

edited by ghost

Loading

ghost commented Jul 26, 2020

ghost commented Jul 29, 2020

Vinotha638 commented Mar 10, 2022

shah0eer commented Jun 24, 2023

How i can train my audio files .to use indian assent . #429

How i can train my audio files .to use indian assent . #429

Comments

ash1407 commented Jul 18, 2020

ghost commented Jul 18, 2020

ash1407 commented Jul 18, 2020

ghost commented Jul 23, 2020

ash1407 commented Jul 23, 2020

ghost commented Jul 23, 2020 • edited by ghost Loading

ghost commented Jul 26, 2020

ghost commented Jul 29, 2020

Vinotha638 commented Mar 10, 2022

shah0eer commented Jun 24, 2023

ghost commented Jul 23, 2020 •

edited by ghost

Loading