I want to make LRC files #8

ClaireCJS · 2023-07-10T09:59:16Z

ClaireCJS
Jul 10, 2023

Hi,

I wanted to use the same 2 packages you are using to make LRC files.

But apparently your 2 packages are in python, your player is in Java, so the only place where you make LRC files is in Java?

I was wondering if you could provide a snippet of how to make an LRC file using your python libraries.

Currently it takes me 20-30 seconds to produce an LRC with other packages by using pre-compiled EXE versions of whisper... but yours looks like it makes much better LRC files.

Answered by EtienneAb3d

Jul 11, 2023

Thanks! I'm always scared of getting hours deep into something only to realize it's the wrong something, so I try to ask questions first!

Ok, but, on my side, I don't know your exact conditions to give you a pertinent answer. Without having tried your own processing by myself, it's up to you to test and compare.

So the ONLY input is a TXT and a SRT, and it fixes the mis-heard/mis-spelled/wrong words in the SRT to match the correct words in the TXT?

To be more precise: WhisperTimeSync is aligning all words between your SRT and your TXT = it tries to find the words of your SRT matching the words of your TXT. Knowing all most-likely word pairs, it then put the timestamps of the SRT at th…

View full answer

EtienneAb3d · 2023-07-10T10:51:57Z

EtienneAb3d
Jul 10, 2023
Maintainer

@ClaireCJS

I'm not sure to understand your question. Can you be more precise?

On the main principle, to produce good SRT files:

you need to use WhisperHallu to produce 2 files: one with a good transcription (doing sound cuts/filtering), and one with good timestamps (possibly with some hallucinations).
then use WhisperTimeSync to put the good timestamps over the good transcription.

WhisperHallu is an experimentation around Whisper, thus written in Python.

WhisperTimeSync aligner is written in Java. Currently not available in Python.

0 replies

ClaireCJS · 2023-07-10T10:59:00Z

ClaireCJS
Jul 10, 2023
Author

I see. Thanks for the info.

I'm already creating LRC files with an all-exe (demucs.exe + whisper-faster.exe) solution at about 30 seconds per song, but the LRC contents are AI-only and thus have a lot of errors.

Sometimes I have a .TXT file (previously automatically downloaded lyrics)

So I wanted to align my produced LRC with any pre-existing TXT. To basically fix the AI-generated errors in the LRC by referring to the TXT file of the lyrics.

I was hoping to stick in all-EXE-file territory because this is about a 15X speedup and the job i plan to run will take about 4 weeks to run. So if I don't have an EXE file, it's about 15X closer = over a year to run.

0 replies

ClaireCJS · 2023-07-10T10:59:46Z

ClaireCJS
Jul 10, 2023
Author

Also do you know how long per typical song it takes to make an LRC with your product?

(With all-EXE it's about 30 seconds, about 60 seconds if you split the vocal track apart first with demucs.)

4 replies

EtienneAb3d Jul 10, 2023
Maintainer

@ClaireCJS

With WhisperTimeSync , you can do what you want: put the good timestamps over the good text.

WhisperTimeSync is not re-processing your sound files, it only processes your SRT/TXT files. It's written in Java, but it's fast (depend on the length of your texts).

ClaireCJS Jul 10, 2023
Author

That's great. I can definitely bake WhisperTimeSync into my own custom-made solution to improve my results.

But I still wonder if I would get better (less transcription errors) results if I used your package end-to-end, y'know?

And if it would just be less hassle to do that and abandon mine......

EtienneAb3d Jul 10, 2023
Maintainer

@ClaireCJS

Give it a try...

ClaireCJS Jul 11, 2023
Author

Thanks! I'm always scared of getting hours deep into something only to realize it's the wrong something, so I try to ask questions first!

Just to clarify one last time (sorry!):

So the ONLY input is a TXT and a SRT, and it fixes the mis-heard/mis-spelled/wrong words in the SRT to match the correct words in the TXT?

EtienneAb3d · 2023-07-11T14:08:10Z

EtienneAb3d
Jul 11, 2023
Maintainer

Thanks! I'm always scared of getting hours deep into something only to realize it's the wrong something, so I try to ask questions first!

Ok, but, on my side, I don't know your exact conditions to give you a pertinent answer. Without having tried your own processing by myself, it's up to you to test and compare.

So the ONLY input is a TXT and a SRT, and it fixes the mis-heard/mis-spelled/wrong words in the SRT to match the correct words in the TXT?

To be more precise: WhisperTimeSync is aligning all words between your SRT and your TXT = it tries to find the words of your SRT matching the words of your TXT. Knowing all most-likely word pairs, it then put the timestamps of the SRT at the right place in the TXT, keeping the text of the TXT unchanged.

See the SYNC part of the Colab output:
https://colab.research.google.com/drive/10r4m_GaTwU-JQMkRe9T0cvgrfQH1le31?usp=sharing#scrollTo=rMJLFqrzPVjd

6 replies

ClaireCJS Oct 30, 2024
Author

To be more precise: WhisperTimeSync is aligning all words between your SRT and your TXT = it tries to find the words of your SRT matching the words of your TXT. Knowing all most-likely word pairs, it then put the timestamps of the SRT at the right place in the TXT, keeping the text of the TXT unchanged.

See the SYNC part of the Colab output: https://colab.research.google.com/drive/10r4m_GaTwU-JQMkRe9T0cvgrfQH1le31?usp=sharing#scrollTo=rMJLFqrzPVjd

Hello again! It is 1.25 years later (😂), and I have a followup question :)

Say one has:

Input 1: SRT with good timestamps and bad-quality text
Input 2: good text-only, but incomplete - for example, the person who wrote up the lyrics for the song missed a verse, or didn't write the chorus down except for the first time

100% of the timestamps in Input 1 are still preserved, right?

Input 2 is just used as a dictionary, right?

I just wanted to be 100% sure before I really plunge into this again :) I'm super thankful for this.

EtienneAb3d Oct 30, 2024
Maintainer

@ClaireCJS
Input 2 is not a dictionary, it's the reference text that will be produced by WhisperTimeStamp together with the timestamps of Input 1.
For your subject, as described today on the Whisper forum, you will then have to extract the words, chunk by chunk, from this output, to put them in the prompts of your next step.

ClaireCJS Oct 30, 2024
Author

Didn't realize you were on that thread too.

Yeah. Just seems if either they could accept enough tokens, or you could not destroy lines that aren't matched and instead fix them based on the TXT file ... That all this could be done in 1 pass and ½ the energy cost. I plan on scaling it to enough files that it will be a nontrivial amount of billed electricity.

🌳🌲🎄 think of the trees lol 🌳🌲🎄

but also ½ the OpenAI api fees too...

EtienneAb3d Oct 30, 2024
Maintainer

I'm on this thread because I'm the author of WhisperTimeSync.
;-)

ClaireCJS Oct 30, 2024
Author

I'm on this thread because I'm the author of WhisperTimeSync. ;-)

Lol 🤣🤣🤣 i'm oblivious sometimes

ClaireCJS · 2024-10-30T16:53:28Z

ClaireCJS
Oct 30, 2024
Author

Actually, by my tests, WhisperTimeSync indeed removes spoken lines from the SRT, which is a destructive behavior I would think would be considered a bug, because it removes valid transcription lines.

Perhaps this is by design, in which case, can we pleeeeeeeeeeeeeeeease have an option to suppress that behavior?

This screenshot is before/after/reference in each column. Specifically, subtitles 32 through 38 are actual words that were actually sung, but WhisperTimeSync has removed themc ompletely from the SRT, seemingly creating blank subtitles for #32-38.

Perhaps this is by design, in which case, can we pleeeeeeeeeeeeeeeease have an option to suppress that behavior?

3 replies

EtienneAb3d Oct 30, 2024
Maintainer

I do not understand why the match is so bad.
Can you share your input files?

ClaireCJS Oct 30, 2024
Author

I do not understand why the match is so bad. Can you share your input files?

Sure!
wts.zip

EtienneAb3d Oct 31, 2024
Maintainer

@ClaireCJS
Ok ! I didn't look properly to your screen copy: in fact, looking at the numbering, all is OK.
Your reference text lacks of the end, from number 32, thus, there is no reference text to put inside the transcribe timestamps.
It's the normal behaviour for WhipserTimeStamp.
In your project, you should simply use the possibility to add up to 240 tokens in the prompt, to add, for example, last 150 viewed words to transcribe a given chunk.
In this example, I would cut the file to process to get several lines of text in each chunk, having about 20s to be processed by chunk, and add previous 150 (max) viewed word in the prompt to increase the probability that the good words are really in, even if the whole lyrics is more than the possible max size of the prompt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to make LRC files #8

{{title}}

Replies: 5 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

I want to make LRC files #8

ClaireCJS Jul 10, 2023

Replies: 5 comments · 13 replies

EtienneAb3d Jul 10, 2023 Maintainer

ClaireCJS Jul 10, 2023 Author

ClaireCJS Jul 10, 2023 Author

EtienneAb3d Jul 10, 2023 Maintainer

ClaireCJS Jul 10, 2023 Author

EtienneAb3d Jul 10, 2023 Maintainer

ClaireCJS Jul 11, 2023 Author

EtienneAb3d Jul 11, 2023 Maintainer

ClaireCJS Oct 30, 2024 Author

EtienneAb3d Oct 30, 2024 Maintainer

ClaireCJS Oct 30, 2024 Author

EtienneAb3d Oct 30, 2024 Maintainer

ClaireCJS Oct 30, 2024 Author

ClaireCJS Oct 30, 2024 Author

EtienneAb3d Oct 30, 2024 Maintainer

ClaireCJS Oct 30, 2024 Author

EtienneAb3d Oct 31, 2024 Maintainer

ClaireCJS
Jul 10, 2023

Replies: 5 comments 13 replies

EtienneAb3d
Jul 10, 2023
Maintainer

ClaireCJS
Jul 10, 2023
Author

ClaireCJS
Jul 10, 2023
Author

EtienneAb3d Jul 10, 2023
Maintainer

ClaireCJS Jul 10, 2023
Author

EtienneAb3d Jul 10, 2023
Maintainer

ClaireCJS Jul 11, 2023
Author

EtienneAb3d
Jul 11, 2023
Maintainer

ClaireCJS Oct 30, 2024
Author

EtienneAb3d Oct 30, 2024
Maintainer

ClaireCJS Oct 30, 2024
Author

EtienneAb3d Oct 30, 2024
Maintainer

ClaireCJS Oct 30, 2024
Author

ClaireCJS
Oct 30, 2024
Author

EtienneAb3d Oct 30, 2024
Maintainer

ClaireCJS Oct 30, 2024
Author

EtienneAb3d Oct 31, 2024
Maintainer