-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
f5-tts is making weird dubbing , you can see in provided audio and srt its horrible, but its working fine in WebView, why cant it create audio properly pyvideotrans ? #636
Comments
Figures in English are not normalized. Will change it later |
still not working after the update of 3.20 |
still not working in pyvideotrans version 3.21 as you can hear in provided audio link below. when will it fixed ? 2 3 4 5 6 audio created : https://drive.google.com/file/d/1uDIe3hgjYU2vt1XKkFIid3-C_jKKeXou/view?usp=sharing |
Please use plain text or valid srt subtitles for dubbing, instead of adding other characters before the subtitles, which will dub out the timestamps as well. Directly use the import function to import locally available legal srt files for dubbing. |
not working i am doing everything correctly i have uploaded video you can see. please do something ? you can hear audio that it created at 2 : 34 Recording.2024-11-26.172353.mp4 |
Explain in words what the problem is Is it reading out the line numbers and the time lines as well? |
del <b> and other html tag from srt file |
no i have both shown audio created by tag and without < b >tag ,plane srt but its generating wierd sounds instead of reading the srt. |
Make sure the srt is legal and there are no html tags etc in it, then rename the subtitle to exp-01.srt and test it again! |
i have tried it again with what you said you can listen the sound it created at 02:58 . is there any other format than srt it supports ? video2_2.mp4srt : 1 2 3 4 5 |
its always sounds like English and French mixed sound |
(( Or you can open the api.py file under f5-tts-api and refer to the source
code to modify it )) I don't understand what are you talking about? Because
I don't know much about coding.
…On Tue, 26 Nov, 2024, 9:27 pm okmyworld, ***@***.***> wrote:
You could have just typed the text in like this.
image.png (view on web)
<https://github.com/user-attachments/assets/c5ad67a9-3020-4e4e-bb2b-34b0fe9c126c>
If it's not a formatting problem, but just a pronunciation problem, that
won't solve it.
Or you can open the api.py file under f5-tts-api and refer to the source
code to modify it.
—
Reply to this email directly, view it on GitHub
<#636 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A7N3QKTYROGHDLY2BVS74Z32CSK5LAVCNFSM6AAAAABSNPFDFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBRGIZTONRWGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
i did that but it still doesn't work and makes audio that sounds weird , does it sounds all right on your pc ? audio.mp4 |
I test no problem |
I tested the cloned voice using f5-tts in pyvideotrans 3.25, but the issue of the voice being unrecognizable is still unresolved. Interestingly, the same voice works perfectly in WEBUI, but not in pyvideotrans. To rule out system-specific issues, I also tested it on my friend's PC. Unfortunately, it didn’t work in pyvideotrans there either, though it still worked fine in WEBUI. |
The webui interface is recognized using openai-whisper's large-v3-turbo model, and the audio is cut using vad before recognition. The api is recognized in pyvideotrans using the specified model, and the audio is cut differently. It's normal that there are differences between the two, the models are different, the cutting parameters are different, how can they be the same. |
when this issue will be solved ? because i cant clone voice in pyvideotrans. |
Don't understand what you mean, if you mean: works well in webui and poorly using api, then it's normal. If you mean: it works fine in the webui, and the sound cloned using the api doesn't correspond at all to the actual text, then I didn't test it! |
No it's f5-tts voice cloning does not work in the pyvideotrans. The
generated voice is unrecognisable it's like aliens speaking in
unrecognisable language.
…On Fri, 29 Nov, 2024, 12:32 pm okmyworld, ***@***.***> wrote:
Don't understand what you mean, if you mean: works well in webui and
poorly using api, then it's normal.
If you mean: it works fine in the webui, and the sound cloned using the
api doesn't correspond at all to the actual text, then I didn't test it!
—
Reply to this email directly, view it on GitHub
<#636 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A7N3QKX73UXXD6VXW73G6KT2DAGR7AVCNFSM6AAAAABSNPFDFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMBXGIYDGMBZGU>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If it was bad i would have somehow managed it. And used by enhancing it.
But totally unrecognisable i don't understand single word even what's its
trying to say.
|
i gave it only 22 seconds srt to create sound but it created voice made up of repeated nonsense up to 2 minute 26 seconds 2 3 4 5 :::::::::::::::::::::::::::::::::::::::::: : transcription of cloned-voice it created : ::::::::::::::::::::::::::::::::: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
en.mp4
I tested it without any problem。 Please make sure that f5-tts-api has downloaded the patch package and upgraded pyvideotrans to 3.26,and please make sure the reference audio and reference text are correct。 It is normal for the subtitle duration to be inconsistent with the dubbing duration。 The
ff5-tts-api patch update https://github.com/jianchang512/f5-tts-api/releases/tag/v0.1 https://github.com/jianchang512/f5-tts-api/releases/download/v0.1/2024-1127-buding.7z |
It's solved! I realized I was making one fatal mistake, which is why the audio was pronouncing words out of recognition when cloning. The mistake was that after the #, I was putting whatever I wanted. I thought it was just for testing whether the API worked or not. However, I realized from the recent solution you provided that the text after the # should correspond to the reference audio. |
出错信息
f5-tts is making weird dubbing , you can see in provided audio and srt its horrible. but its working fine in WebView, why cant it create audio properly pyvideotrans ?
srt : 1
00:00:00,000 --> 00:00:02,366
In a world where everyone has awakened, a world of advanced talents,
2
00:00:02,500 --> 00:00:05,716
a man chooses to become a jobless wanderer. His classmates mercilessly mock him,
3
00:00:05,783 --> 00:00:08,366
saying this talent can't compare to the advanced skills gained after a job change.
4
00:00:08,433 --> 00:00:09,933
They tell Yun Chen to quickly find a place to work.
5
00:00:09,933 --> 00:00:12,783
Even Teacher Rose advises Yun Chen to choose a professional talent soon,
6
00:00:12,833 --> 00:00:14,816
because the benefits after changing jobs are much greater.
audio link created by f5-tts
https://drive.google.com/file/d/1ZRgKFunyf-LQiLfpvqs5fhCC3kNj2v-x/view?usp=sharing
复现步骤
操作系统
its working fine in WebView interface and creating audio properly
audio created in WebView : https://drive.google.com/file/d/1cr6RbZR7rwc9G7NS74KNWKoUP0Qgpo8n/view?usp=sharing
The text was updated successfully, but these errors were encountered: