Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloning accuracy #404

Closed
brcisna opened this issue Jul 6, 2020 · 3 comments
Closed

cloning accuracy #404

brcisna opened this issue Jul 6, 2020 · 3 comments

Comments

@brcisna
Copy link

brcisna commented Jul 6, 2020

Hello All,
This is not an issue, but seeings how there are no forums for this software just wanted anyone's thoughts on cloning accuracy they are coming up with.

I personally use this in kind of an odd manner,in that I use it to clone voices to be used in historic videos as a narrator type scenario Do old auto racing history so my routine is.

    • to find only 60 seconds of a past racer voice on yourtube.
      2- Save this audio ,split it into 10-15 second audio clips.
      3- feed (4) 10 second clips to the toolbox, synthezising each clip.
      4- synthesize and vocode after typing in the narration i want to use in text box

The results surprisingly is very accurate ,other than no 'emotion' is possible .Of course am just using this at a hobby level. This probably wouldn't be acceptable for someone trying to do professional presentations maybe? I am not sure my procedure is really even correct. Works for me. Sometimes the results ends up with slight what i would call 'wind in the microphone' muffling effect at either start or finish of generated audio.
Also i am not sure how to interpret the lower left box were your points are generated projections are all over from the same voice. Am pretty sure these points should be almost directly on top of one another. Am very green at how this is suppose to happen.

Anyone ,please comment on their routine.

Sorry for long post.

Admin : if this is not acceptable here delete post.

Thanks.
.

@ghost
Copy link

ghost commented Jul 8, 2020

3- feed (4) 10 second clips to the toolbox, synthezising each clip.

I believe the embedding only depends on the last file loaded. In other words, the toolbox has no memory and does not learn as it is used. So you can experiment to see which of the clips results in the best cloned voice. (Will be a lot easier with the toolbox once #402 is merged)

Also i am not sure how to interpret the lower left box were your points are generated projections are all over from the same voice. Am pretty sure these points should be almost directly on top of one another.

You are correct, if the speaker encoder is good then all the points from a single speaker should form a distinct cluster away from other speakers. However, if it is only plotting data from a single speaker then I think the autoscaling will make those points appear farther apart than they are in reality.

@ghost
Copy link

ghost commented Jul 12, 2020

Sometimes the results ends up with slight what i would call 'wind in the microphone' muffling effect at either start or finish of generated audio.

@brcisna Can you try this vocoder model and let me know whether you still get "wind in the microphone" effect? #126 (comment)

Because the synthesizer is not deterministic you will need a few attempts to conclude if there is a difference. If it still occurs with the new vocoder it is likely an artifact of the synthesizer.

I personally use this in kind of an odd manner,in that I use it to clone voices to be used in historic videos as a narrator type scenario Do old auto racing history so my routine is.

If you plan on distributing your work please be mindful of the legal implications of using someone else's voice and make sure you have secured rights if necessary.

@ghost
Copy link

ghost commented Jul 18, 2020

Closed due to inactivity. @brcisna please reopen the issue if you have more to discuss.

@ghost ghost closed this as completed Jul 18, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant