-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow deterministic generations #175
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool. I am doing this ad-hoc, so hopefully this idea gets implemented.
thanks for this- keeping open so it's on our radar |
I think this PR isn't necessary anymore, I found a package that does the same thing, now I am using it. from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
import numpy as np
import pytorch_seed
preload_models()
prompt = "I have a silky smooth voice, and today I will tell you about the exercise regimen of the common sloth."
with pytorch_seed.SavedRNG(123):
audio_array_1 = generate_audio(prompt)
write_wav("/path/to/audio_1.wav", SAMPLE_RATE, audio_array_1)
with pytorch_seed.SavedRNG(123):
audio_array_2 = generate_audio(prompt)
write_wav("/path/to/audio_2.wav", SAMPLE_RATE, audio_array_2)
assert(np.array_equal(audio_array_1, audio_array_2)) |
neat thanks! out of curiosity, what do you need the seed for? is shouldn't really help with consistency right? like, if you change the text prompt the results will be completely different regardless of seed, no? |
It helps to get the same voice and intonation using the history_prompt + the seed used to create that history_prompt. So I use the seed + history_prompt together, because even with the history_prompt if the seed is different the voice is not exact the same, sometime it sounds too different, but the pair seed + history_prompt fix it. And the seed also helps to get better consistency in a list of prompts (long text), if we have too many prompts on the list after some prompts it starts to sound too different. The pair seed + prompt always gets the same voice, if we change the seed or prompt the voice will be different. So, to find a voice I choose a prompt that fits the voice I want, then I generate multiple audios changing the seed and saving the seed + history_prompt, and I choose the best one. To generate other prompts in sequence (long text) I set the saved seed + history_prompt for the first prompt on the list, then for the other prompts I set the saved seed + the history_prompt return by the output_full=True of the first prompt, because it helps to keep consistency. With that process the voice sounds the same and keeps the same intonation for the whole audio. |
Sorry I'm a bit confused, at the end how do you use deterministic generation here? do you need pytorch_seed? is there a way just with pytorch? |
Yes you can do it with just pytorch, the pytorch_seed is just a helper function to set and manage the seed. The reason for using the same seed is simple, the random numbers dictates the generations, if you use the same random numbers (seed) in all generations in a long text it will have more similar results. Just to let that clear, I think this PR isn't necessary anymore, it is still open as a reference while the suno team researches that topic. While this is an option to achieve better consistence, now I think it should be better if the bark stays as simple as possible and away from specialized changes, there are a lot of projects that use bark and they can use this approach if they want to, without the need to have it as a builtin feature. |
Edit: some changes got implemented in another commit, so I updated the description to represents only the current changes.
This PR adds the
set_seed(seed)
to allow deterministic generations:seed = set_seed()
orseed = set_seed(0)
to generate and set a random seed, the seed is returned.set_seed(seed)
to set a specific seed number.set_seed(-1)
to disable the deterministic process and go back to fully non-deterministic.BE AWARE: the seed affects torch, numpy and python, so if you are running other softwares that require non-deterministic random values, remember to call
set_seed(-1)
after you generate the audio.Example: