-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Works in Spanish? #789
Comments
There's information on other people's attempts at this on existing issue Support for other languages #30. |
There is no Spanish model yet. |
How can collaborate?
El sáb., 10 de jul. de 2021, 8:12 a. m., pilnyjakub <
***@***.***> escribió:
… There is no Spanish model yet.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#789 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADIWFGK5WVUWHPXCBCYIG3TXBBL3ANCNFSM473UA5FA>
.
|
You can check the file diff in my repo for reference. Mine works for Chinese and I think you can do the similar modification. |
For audios to be detected, the directory structure must match this exactly including the "speaker" and "book_dir" levels. #437 (comment) |
With the same names, too? |
It is not necessary to use the same names, except However, please try matching the names before reporting a problem, or when asking for help to troubleshoot an issue like this. |
I did it both with their own names and with the names you give as example. However, I got some errors:
With cv-corpus it processed some of the files (1 625/271 010), but then stopped (after at least 30 minutes) and displayed that UnicodeDecodeError. |
For the 1st issue: #841 (comment) For the 2nd issue, in |
Thank you @pilnyjakub for the fast response, I turned on again my laptop as soon as I saw it. I figured out that the files' names for the tux-100h dataset are just numbers, I just changed the first txt and wav files from "0" to "audio-0" and now it is running. Hope it goes well until the end. |
I have been training since Monday on tux100h dataset for approximately 32-50h the model for the synthetizer. It has been saved with 50k steps, but I have stopped the training in 57k steps, there's a loss of 0.21-0.24 approximately. However, I have tried to clone my voice using that model in the demo_cli.py and the output does sound like a human, but it sounds like one from the dataset, not like my voice, which is the input. Any recommendations? I am using encoder and vocoder pre-trained models given in the repo. |
I found out tux100h has the same voice or a very similar voice in all audios. That may be the problem. Then, I started the preprocessing on cvcorpus dataset which has multiple speakers, but I got the following error:
|
Please I need some help here, with cv-corpus dataset I tried using less data and I could made it preprocess and train, but the results were very bad. |
I want to collaborate I have 2 GPU 3060, pls send me all steps and I bring
u power of compute
Stay alert.
El mié., 1 de dic. de 2021, 7:50 p. m., AlexSteveChungAlvarez <
***@***.***> escribió:
… Please I need some help here, with cv-corpus dataset I tried using less
data and I could made it preprocess and train, but the results were very
bad.
By the way, all the process from preprocessing until training took me
about a week each time I tried on a different dataset or dataset subsample.
I am using a laptop with a NVidia GeForce RTX-2060 gpu, so I think it
shouldn't be that slow. Any help? I am considering using another different
dataset since both cv-corpus and tux100h didn't give me results compared to
the original release of the project, but I don't know which one to choose
now and just have 1 month to finish this.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#789 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADIWFAH76SA6MGOEPJKFNDUO27FDANCNFSM473UA5FA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Thank you very much @johnfelipe, I sent you an email. |
I trained the synthesizer with this dataset: http://openslr.org/73/ . |
Hi. Any update in here? thanks |
https://github.com/AlexSteveChungAlvarez/Real-Time-Voice-Cloning-Spanish |
Thank @AlexSteveChungAlvarez s. I have some questions: 1) where should I locate my datasets? 2) the synthesis phase is included when running demo_toolbox.py? 3) is there a maximum length of text to synthesize? If you have a tutorial/documentation beside the paper is welcome. Thank you again |
If you want to train your own model, you should follow the instructions given in this repo, the toolbox does the synthesis when you click on the button it has to generate voice (there's a video in this repo explaining that), there isn't a maximum length of text to synthesize, but the recommended length is to synthesize an audio of a similar length to the original (if not, it happens sometimes that it has some kind of silences or noise). If you train your own model, please share it to me, since my college team is working now on a web interface to calculate the MOS of the Spanish models shared by the community! |
Sure! I'll share my model with you. As recommendation, I would like to generate synthesize audios of approximately 10 min, hence, do you recommend me to train with audio files of length 10 min? Or which is, in your experience, the most efficient length for an excellent quality? |
No need to train with audio files of the wanted length to produce, just try to use a reference (target) audio to clone of that length and it will work. |
Pls share in here results for try later...
El sáb, 30 de jul. de 2022, 10:42 a. m., AlexSteveChungAlvarez <
***@***.***> escribió:
… No need to train with audio files of the wanted length to produce, just
try to use a reference (target) audio to clone of that length and it will
work.
—
Reply to this email directly, view it on GitHub
<#789 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADIWFFYXA2BCOFTN6YY37DVWVEP5ANCNFSM473UA5FA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Another question, I don't have an NVIDIA GPU, can I use the CPU instead? |
Yes, you can use your CPU. |
I've followed this video and could run the program. However: I add a screenshort of the program: I just added the wav file and click on Synthesize and vocode. Is this the procedure so far? |
I also get the same error when I use first one model and then change it to the other, I just downloaded everything again and made sure to run the correct one from the beginning to overcome it, so, whenever I want to use any model, I do it from its respective folder (I have like 2-3 different folders to run different models). I don't know what should be the correct way to fix that error. |
Thanks, @AlexSteveChungAlvarez, I've solved by changing the type of Synthesizer. I'm confused right now. I've used the pretrained models and used an audio file of 23 minutes, however, I cannot synthesize correctly my voice using simple sentences in Spanish. What I'm doing wrong? Should I train a specific model with my own voices? |
I haven't tried with an audio of that length since my objective of using this code was to clone voices from few length speech audios and few target audios from the people I wanted to clone. When you have access to more audios from the person you want to clone, you will get better results by fine-tuning the model (there is a guide somewhere in the repo on how to do this). I haven't done this, but yeah, you need to train the model with your own voices, by applying transfer learning in the pre-trained model. I think it would be a better solution in your case (if you have many audios of the target voice). If not, try sharing the process you are doing in order to clone with your 23 min audio, maybe you are writing with periods (".") at the end of each sentence and it does make the synthesis go wrong, try using commas instead of periods! |
No description provided.
The text was updated successfully, but these errors were encountered: