You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a text were the audio includes numbers (e.g. 16, 29, 32) and the whisperx loads the information and transcript perfect, but when I try to run the word alignment, I stumble upon an issue - the numbers are separated out as words and for that reason they have empty start time and end time values. For the wav2vec models I tried, metadata only includes non-numerical characters [a-z].
Has anyone had any other similar issue and maybe know a wav2vec (from huggingface) model in English that would solve this issue?
Thanks for help in advance,
The text was updated successfully, but these errors were encountered:
Hi, I have a text were the audio includes numbers (e.g. 16, 29, 32) and the
whisperx
loads the information and transcript perfect, but when I try to run the word alignment, I stumble upon an issue - the numbers are separated out as words and for that reason they have empty start time and end time values. For the wav2vec models I tried, metadata only includes non-numerical characters [a-z].Has anyone had any other similar issue and maybe know a wav2vec (from huggingface) model in English that would solve this issue?
Thanks for help in advance,
The text was updated successfully, but these errors were encountered: