Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random strings and wrong special characters in edited subtitle text #941

Closed
Piriya-VM opened this issue Dec 21, 2022 · 6 comments
Closed
Labels
type:bug Something isn't working

Comments

@Piriya-VM
Copy link

We use Vosk to create subtitles and edit them in the subtitle editor.
After saving and publishing, the subtitle file includes strange strings and special character like ä, ü, ö don't get displayed correctly.
See the attached vtt file, screenshot and partial-publish Workflow.

WEBVTT

3lPSCXPUvSk3QfzzsrXDj
00:00.420 --> 00:05.930
Apfel

iOmyuq7YoRVkBdHrDHj8m
00:06.930 --> 00:10.530
Hallo ich bin Vosk und versuche nur 
zu Ueberleben irgendwas irgendwas.

3bW92IwxDIrqu0k_HdIeu
00:10.530 --> 00:14.100
orange

zPaQJaT9U0iBY2JI5sqFS
00:14.100 --> 00:18.750
kiwi 
irgendwas irgendwas irgendwas

lBh9Dx0GfstMMqgq-R1XG
00:19.920 --> 00:22.860
ananas

editor_bug

partial-publish.txt

@Piriya-VM Piriya-VM added the type:bug Something isn't working label Dec 21, 2022
@DZenker
Copy link

DZenker commented Jan 5, 2023

I can confirm this bug, this also happens in our installation (OC 12.6).

@fsufrre
Copy link

fsufrre commented Jan 31, 2023

I can confirm this bug too. (OC 12.4)
Any ideas when this bug is fixed?

@Arnei
Copy link
Member

Arnei commented Feb 2, 2023

Interestingly, I cannot confirm this bug (OC 0fb1c14, also newer versions). The characters ä, ü, ö render normally.

The "strange strings" like 3lPSCXPUvSk3QfzzsrXDj are subtitle cue ids generated by vosk. I do not know why vosk generates them, but looking at the downloaded subtitle file they are perfectly legal WebVTT. Maybe Paella Player doesn't know how to handle them?

@fsufrre
Copy link

fsufrre commented Feb 2, 2023

we are using amberscript. but i think the way of delivery is not the point. while saving the edited subtitle and republish the transcription, the *vtt file gets some extra strings. its visible in paella and also after downloading the *vtt File. I think the magic happens in "saving" process.

@Arnei
Copy link
Member

Arnei commented Feb 2, 2023

True, it seems blaming the strage strings on vosk was hasty on my part. They are still legitimate WebVTT cue ids though, so maybe that's worth a separate issue in the main Opencast repo on how Paella handels cue ids.

@lkiesow
Copy link
Member

lkiesow commented Mar 31, 2023

This should be fixed in recent versions of Opencast by now.
For more details, see opencast/opencast#4764

If the problem persists regardless, please re-open.

@lkiesow lkiesow closed this as completed Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants