Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support fish speech reference audio #2542

Merged
merged 4 commits into from
Nov 19, 2024

Conversation

codingl2k1
Copy link
Contributor

@XprobeBot XprobeBot added the enhancement New feature or request label Nov 11, 2024
@XprobeBot XprobeBot added this to the v0.16 milestone Nov 11, 2024
@qinxuye
Copy link
Contributor

qinxuye commented Nov 15, 2024

Is reference_audio in fish equivalent to prompt_speech in cosyvoice? I wonder if it's possible to reuse the option?

@codingl2k1
Copy link
Contributor Author

Is reference_audio in fish equivalent to prompt_speech in cosyvoice? I wonder if it's possible to reuse the option?

We can reuse the option, then users will need to pass prompt_speech instead of reference_audio. I am not sure which name is better.

@qinxuye
Copy link
Contributor

qinxuye commented Nov 15, 2024

Is reference_audio in fish equivalent to prompt_speech in cosyvoice? I wonder if it's possible to reuse the option?

We can reuse the option, then users will need to pass prompt_speech instead of reference_audio. I am not sure which name is better.

Yeah, we can unify the APIs since prompt_speech is added already, we can add it in doc that it can be used as reference_audio for fish speech.

@codingl2k1
Copy link
Contributor Author

Is reference_audio in fish equivalent to prompt_speech in cosyvoice? I wonder if it's possible to reuse the option?

We can reuse the option, then users will need to pass prompt_speech instead of reference_audio. I am not sure which name is better.

Yeah, we can unify the APIs since prompt_speech is added already, we can add it in doc that it can be used as reference_audio for fish speech.

I will fix it.

@qinxuye
Copy link
Contributor

qinxuye commented Nov 18, 2024

I had some comments:

  1. We may set enable_reference_audio to True if prompt_speech specified.
  2. We can support passing compile=True for model loading which provides significant performance improvement.

@qinxuye
Copy link
Contributor

qinxuye commented Nov 19, 2024

Looks like our CI cannot run with compile=True.

@qinxuye qinxuye force-pushed the enh/fish_speech_reference_audio branch from 6c6bda2 to 66f77ca Compare November 19, 2024 12:45
Copy link
Contributor

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye marked this pull request as ready for review November 19, 2024 13:24
@qinxuye qinxuye merged commit 0cdfb43 into xorbitsai:main Nov 19, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fish-Speech启用reference-audio
3 participants