Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTL display for subtitles #12978

Open
ShlomoCode opened this issue Nov 27, 2023 · 13 comments · May be fixed by #12985
Open

RTL display for subtitles #12978

ShlomoCode opened this issue Nov 27, 2023 · 13 comments · May be fixed by #12985

Comments

@ShlomoCode
Copy link

Before requesting a new feature make sure it hasn't been requested yet.
meta:feature-request

Expected behavior of the wanted feature

Option to display subtitles with RTL (right to left) correct.
Either automatic (detection of the subtitle language) or a command line option.
This is consumed for translated subtitles, that the translation always leaves certain words in English, and then those words escape to the right side of the display.

It was created here following this problem that I opened in IINA that uses mpv behind the scenes: iina/iina#4698

Log file

I'm using iina which uses mpv behind the scenes so I have no idea how to get a log file.

@llyyr
Copy link
Contributor

llyyr commented Nov 27, 2023

diff --git a/sub/sd_ass.c b/sub/sd_ass.c
index 6742f6f658..1b139b1a06 100644
--- a/sub/sd_ass.c
+++ b/sub/sd_ass.c
@@ -448,8 +448,10 @@ static void configure_ass(struct sd *sd, struct mp_osd_res *dim,
     ass_set_hinting(priv, set_hinting);
     ass_set_line_spacing(priv, set_line_spacing);
 #if LIBASS_VERSION >= 0x01600010
-    if (converted)
+    if (converted) {
         ass_track_set_feature(track, ASS_FEATURE_WRAP_UNICODE, 1);
+        ass_track_set_feature(track, ASS_FEATURE_WHOLE_TEXT_LAYOUT, 1);
+    }
 #endif
     if (converted) {
         bool override_playres = true;

Can you try this diff? Or alternatively provide sample subtitles, if it works then hooking it up to an option should be trivial

@ShlomoCode
Copy link
Author

ShlomoCode commented Nov 27, 2023

Or alternatively provide sample subtitles

video
subtitles
For example at position 0:20 in the video there is a sentence in Hebrew with one word in English

@llyyr
Copy link
Contributor

llyyr commented Nov 27, 2023

subtitles

url doesn't work for me, can you just upload the file to github?

@ShlomoCode
Copy link
Author

ShlomoCode commented Nov 27, 2023

GitHub does not allow uploading .str files...

We don’t support that file type.

Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP.

Here it is in a zip file:
[Hebrew] TypeScript vs JavaScript _ Guido van Rossum and Lex Fridman [DownSub.com].srt.zip

@llyyr
Copy link
Contributor

llyyr commented Nov 27, 2023

I couldn't reproduce the "english words escape to the right" issue. It renders how it is in the sub file

image

What version of iina are you using?

@ShlomoCode
Copy link
Author

ShlomoCode commented Nov 27, 2023

I couldn't reproduce the "english words escape to the right" issue. It renders how it is in the sub file

CleanShot 2023-11-27 at 13 54 12@2x

With correct RTL it should look like this:
CleanShot 2023-11-27 at 13 58 07@2x
"It renders how it is in the sub file" it again depends on how you opened the subtitle file - not every editor properly supports RTL. For example, VSCode is an example of software that supports RTL very poorly, and Microsoft Word (not a code editor but a text editor) supports it very well.

What version of iina are you using?

1.3.3 Build 138
mpv 0.35.0-419-gf79 FFmpeg 6.0

@llyyr llyyr linked a pull request Nov 27, 2023 that will close this issue
@ShlomoCode
Copy link
Author

@llyyr Thanks for this quick fix! very appreciate :)
I downloaded the artifact from the PR (the mpv-i686-w64-mingw32 file) and i confirm that the specific case I demonstrated (position 00:00:20) is displayed properly from right to left, but it seems that there are cases where the direction still gets confused, for example at position 00:00:34 in the video:

CleanShot 2023-11-27 at 16 01 46@2x

The word "JavaScript" should be on the right side (the beginning of the sentence in RTL), and "JavaScript EES" should be on the left side (the end of the sentence in RTL), like in Microsoft Word for example:

CleanShot 2023-11-27 at 16 07 21@2x

Or just html with direction: rtl css:
CleanShot 2023-11-27 at 16 06 00@2x

Same thing for example at position 00:00:39:
CleanShot 2023-11-27 at 16 06 50@2x

The word "Transpilers" is the beginning of the sentence and therefore should be displayed on the right side in a right-to-left language.

I checked and the problem also exists when using a single subtitle track.

This is the command I used:

./mpv.exe video.mp4 --sub-file=sub_he.srt --sub-file=sub_en.srt --sub-detect-rtl --sid=1 --secondary-sid=2

I am also attaching the LTR subtitles, in case you need them:
subs.zip

@ShlomoCode
Copy link
Author

Now I'm thinking, maybe the correction only helped in cases where the first letter in the current sentence is in Hebrew and the English word is in the center of the sentence, but if the first word in the sentence is in English, the entire sentence is defined as LTR?

@avih
Copy link
Member

avih commented Nov 28, 2023

285902001-c2f20e0d-617a-445e-9c9e-121a5bb9e807

For reference, this subs file is auto translation of youtube to Hebrew.

Is there any player which shows this correctly? As far as I can tell, both MPC and VLC also show it broken like this. Even in Firefox at the youtube page it shows it the same (broken).

As far as I can tell it only shows correctly in chrome (based) browsers.

Maybe chrome does some magic where it knows to show it RTL primarily, which bypass the brokenness of the subs?

@llyyr
Copy link
Contributor

llyyr commented Nov 28, 2023

Maybe chrome does some magic where it knows to show it RTL primarily, which bypass the brokenness of the subs?

Kodi displays it correctly but violates specifications while doing so xbmc/xbmc@d807316

edit:

The original webvtt generated by youtube itself doesn't contain any RTL markings, so it's not really possible to auto-detect this information. I'd consider these subtitles to be broken, but there's still some merit for libass to provide some API for forcing RTL rendering

@avih
Copy link
Member

avih commented Nov 28, 2023

Kodi displays it correctly but violates specifications while doing so xbmc/xbmc@d807316

As far as I can tell, this patch replaces LTR/RTL marks encoded as HTML literals ‎, ‏ (should translate to U+200E, U+200F, respectively) with the equivalent embedded marks (U+202A, U+202B), because, according to the patch, libass only interprets the latter?

However, neither the SRT nor the original VTT from which it was converted have any RTL marks (HTML/direct/embedded), so basically whoever renders it simply can't know that it's primarily RTL.

TL;DR: these subs are broken.

The reason chrome renders it correctly is because it has a CSS tag which overrides to RTL, but that info is not conveyed at the vtt/srt files.

libass (via fribidi) can guess most of the lines correct, according to the first word of the line, but this would only work for lines which begin in Hebrew (and this vtt/srt also has lines which should be RTL but begin in English - so that would be broken anyway with auto detection).

However 2, libass doesn't enable autodetection in fribidi by default, for compatibility with vsfilter.

To enable autodetection (which would still be broken with lines which begin in English), we'd need #12985 .

To allow the user to force RTL primarily for all the lines (in fribidi, instead of autodetect), libass will need to add some support which currently doesn't exist.

However, the first and bottom lines are that the subs are broken. The RTL info is conveyed at a side channel of the browser, and it's not part of the subs themselves.

Any auto detection, or force options, would be ugly workarounds for broken subs.

@ShlomoCode
Copy link
Author

ShlomoCode commented Nov 28, 2023

I think automatic detection based on paragraph initiation is common behavior. This is actually the default for browsers, meaning that when RTL/LTR is not explicitly set it will be auto
https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes#dir

@0xifarouk
Copy link

Any updates on this issue please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants