Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character AI - New TTS system #167

Closed
sivertheisholt opened this issue Apr 20, 2024 · 21 comments
Closed

Character AI - New TTS system #167

sivertheisholt opened this issue Apr 20, 2024 · 21 comments

Comments

@sivertheisholt
Copy link
Contributor

They have added a new TTS system, where you can create custom voices or use other voices/predefined voices.
Would be cool if we could implement this. The current implementation is only able to fetch and use the old voices.

https://blog.character.ai/character-voice-for-everyone/

Any thoughts on this? I will take a look at it and see how it works.

@realcoloride
Copy link
Owner

realcoloride commented Apr 20, 2024

This is honestly super uncanny and scary but how does the format work? is it a file? is it streamed audio?

@sivertheisholt
Copy link
Contributor Author

This is honestly super uncanny and scary but how does the format work? is it a file? is it streamed audio?

From what I have tried you can upload an audio file with 10-15 seconds of talking. It is scary accurate...

@realcoloride
Copy link
Owner

This is honestly super uncanny and scary but how does the format work? is it a file? is it streamed audio?

From what I have tried you can upload an audio file with 10-15 seconds of talking. It is scary accurate...

The google funding must have been crazy but hell no I am not uploading any audio 😭

@IRON-M4N
Copy link

So, I cloned the voice of Furina (a character from Genshin Impact) using this audio sample: https://i.imgur.com/Wb9XuG3.mp4

I tried c.ai and play.ht for comparison. The character AI cloning was way faster and sometimes sounded identical to the real one.

Play.ht cloning is really good but sometimes it glitches the voice.

I think it would be a cool feature to use the cloned voices in the module.

@realcoloride
Copy link
Owner

Will look into it

@realcoloride
Copy link
Owner

image
Ok so I have no idea what the format of the data is, but the domain is used in joinOrCreateSession to create an rtc session.

image
Now i'd actually be surprised if it REALLY opened a webrtc session to do this, but i'll investigate more

for fetching the voices it should be easy, they have a domain for it.

@Jatinverma0786
Copy link

Any update brother about voice system?

@realcoloride
Copy link
Owner

Hello,

Not really. I haven't looked for it since but if I have interest and this is a feature a lot of developers would like to see on the package feel free to let me know.

Cheers

@Jatinverma0786
Copy link

Hello,

Not really. I haven't looked for it since but if I have interest and this is a feature a lot of developers would like to see on the package feel free to let me know.

Cheers

I think this feature will be the most demanding feature as I am also a developer and I need this feature so hardly so I request you to work on it please

@wolfboss356
Copy link

I also request that you to work on it. I'm an animatronic developer, and using premade Character AI TTS would save me a lot of time wrangling Mozilla TTS and installing a custom dataset on it.

@realcoloride
Copy link
Owner

After investigation, implementing this feature would mean to rework the entire endpoints to switch to the new (neo) endpoints and switch to a websocket etc.

@Jatinverma0786
Copy link

After investigation, implementing this feature would mean to rework the entire endpoints to switch to the new (neo) endpoints and switch to a websocket etc.

So you will work on it or not brother or you need help in coding part?

@realcoloride
Copy link
Owner

I think I might have to consider both. I am considering rewriting and upgrading to the new API. Will let you know, I am very busy right now.

Cheers :)

@keyserjaya
Copy link

Omg, didn't expect it using livekit. It likes we are really chatting with someone :D
Maybe you have some clues to share to make it works?

Maybe some of us can help.

@realcoloride
Copy link
Owner

realcoloride commented Jul 2, 2024

Hey there, this should help

However switching to the new endpoints require a total rewrite of the client (which I started doing to support all the new endpoints) but this is much more work needed than I thought & my attempts at handling the websocket part confused me a lot

@Jatinverma0786
Copy link

I have foundout that Character.AI is using EdgeTTS LOL

@realcoloride realcoloride mentioned this issue Jul 23, 2024
9 tasks
@yukiarimo
Copy link

Any updates?

@realcoloride
Copy link
Owner

Check out #180. I will get to it soon, I am currently busy working on another project.

@realcoloride
Copy link
Owner

Hey there, sorry for the lack of updates.

Could using the livekit client be a great idea to handle the actual TTS part? Also, upon investigating the way it authorizes access to the actual livekit session is weird.

@Jatinverma0786
Copy link

Hey there, sorry for the lack of updates.

Could using the livekit client be a great idea to handle the actual TTS part? Also, upon investigating the way it authorizes access to the actual livekit session is weird.

As for my information it's not using livekit anymore they are using there own system as of now they are fetching Mp3 file from there server.

@realcoloride
Copy link
Owner

Hello. This is being worked on so all future discussion will be redirected here #180
If you have any problems, feel free to re-open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants