Replies: 6 comments 1 reply
-
Hello, did you compare this to |
Beta Was this translation helpful? Give feedback.
-
Yeah, with I have some examples but being in Finnish it probably doesn't help. I have to watch in slow motion to see the difference. I will soon experiment with English language. Instruments are suppressed. word, lines: lines.mp4word, segment: segment.mp4 |
Beta Was this translation helpful? Give feedback.
-
That's interesting, the problem with generalizing this is that not all text comes with newline separators and even if it exists, it might not be in a semantically meaningful position, an alternative is to use both word and sentence splitting, and inserting |
Beta Was this translation helpful? Give feedback.
-
Hmm... In my use case the verses are always seperated by a newline and it makes the most sense to just follow the lyrics provider. I think that a good solution would be that the user has an option to insert the What do you mean " Btw, do you run the audio on the audio before instrument suppression or after it?" ? |
Beta Was this translation helpful? Give feedback.
-
It's already doable, you can break down |
Beta Was this translation helpful? Give feedback.
-
By applying the commit one can set For example I wanted to have them every newline from ctc_forced_aligner import (
load_audio,
load_alignment_model,
generate_emissions,
preprocess_text,
get_alignments,
get_spans,
postprocess_results,
)
import re
audio_path = "test.wav"
text_path = "test.txt"
language = "fi"
device = "cuda" if torch.cuda.is_available() else "cpu"
batch_size = 16
alignment_model, alignment_tokenizer, = load_alignment_model(
device,
dtype=torch.float32 if device == "cuda" else torch.float32,
)
audio_waveform = load_audio(audio_path, alignment_model.dtype, alignment_model.device)
text = ""
with open(text_path, "r") as f:
text = f.read()
text = re.sub(r"\n+", "<star>", text)
#Replace every, or every consecutive newline characters to <star>
print(text)
emissions, stride = generate_emissions(
alignment_model, audio_waveform, batch_size=batch_size,
)
tokens_starred, text_starred = preprocess_text(
text,
romanize=True,
language=language,
star_frequency="custom", #This must be set to "custom"
)
segments, scores, blank_id = get_alignments(
emissions,
tokens_starred,
alignment_tokenizer,
)
spans = get_spans(tokens_starred, segments, blank_id)
word_timestamps = postprocess_results(text_starred, spans, stride, scores)
print(word_timestamps) |
Beta Was this translation helpful? Give feedback.
-
Hey, I got better accuracy of word level timestamps for my purposes by adding
<star>
tokens per line (\n). The implementation of mine is kinda incoherent so I didn't mind to PR. A better implementation would just be replacing \n with<star>
and somehow escaping the<star>
so that romanization doesn't strip em.in text_utils.py:
Beta Was this translation helpful? Give feedback.
All reactions