Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev pr2 : handle multi-speaker and GST in synthetizer class #5

Closed
wants to merge 3 commits into from

Conversation

kirianguiller
Copy link
Contributor

@kirianguiller kirianguiller commented Feb 26, 2021

Hi guys,

Here the second split of the PR I did earlier this week.

This new content is for handling multi speaker and GST inference in the Synthesizer class (that is used in server.py or in the google colab for Chinese that I mentioned in the first point). Now, you can pass the following two optional parameters to the Synthesizer.tts() method :
speaker_json_key and style_wav . speaker_json_key is the name of the key of one of the speaker in the provided speakers.json . style_wav is either a path to a wav file for GST style transfer, or is a dict containing the {"token1":0.25, "token2" -0.1, etc...}. *The next step is to also give the user the possibility to directly provide the optional parameter speaker_embedding that is a speaker embedding (as a numpy array or a list?) that will be passed to Tacotron at inference time.

The synthesizer class is now simplier to use, and we can see in this google colab that this reduces the number of lines required for having working generation samples.

Thanks :)

@kirianguiller kirianguiller changed the title Dev pr2 Dev pr2 : handle multi-speaker and GST in synthetizer class Feb 26, 2021
@erogol erogol added the enhancement General library enhancement. label Mar 6, 2021
@erogol
Copy link
Member

erogol commented Mar 22, 2021

Sorry for being slow. I'll check the PR definitely the latest tomorrow.

@kirianguiller
Copy link
Contributor Author

No problem ! Just let me know if you have any requests for modification :)

@erogol
Copy link
Member

erogol commented Mar 23, 2021

I think the only immediate requirement is writing some testing code for the synthesizer. I'll write one for the current synthesizer in the dev branch then you can rebase and add more for multi-speaker and GST changes you made.

@erogol
Copy link
Member

erogol commented Mar 23, 2021

ok we already have test_synthesizer.py :)

Can you implement test cases for your changes - GST and multi-speaker?

@kirianguiller
Copy link
Contributor Author

Yes ! I will implement this and push the changes at the beginning of next week :)

@erogol
Copy link
Member

erogol commented Apr 16, 2021

@kirianguiller any updates?

@kirianguiller
Copy link
Contributor Author

@kirianguiller any updates?

Yes sorry, quite busy weeks I had here. Thanks for reminding me though. I will implement the test for the code I added and work on the new conflicts.

@erogol
Copy link
Member

erogol commented Apr 16, 2021

@kirianguiller I am also on this PR. Maybe better if you wait me to push my updates. I also rebased the latest dev.

I'll ping you.

@kirianguiller
Copy link
Contributor Author

Oh cool ! Thank you. I am waiting for your changes then :)

@erogol
Copy link
Member

erogol commented Apr 28, 2021

I close this for the sake of #441

@erogol erogol closed this Apr 28, 2021
erogol added a commit that referenced this pull request May 6, 2021
erogol added a commit that referenced this pull request May 11, 2021
eginhard referenced this pull request in idiap/coqui-ai-TTS Apr 2, 2024
gravityrail pushed a commit to gravityrail/TTS that referenced this pull request Jul 8, 2024
Add tokenizer logging, update version for release 0.23.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement General library enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants