[POC] Naïve right-to-left labeling #6057

1ec5 · 2016-08-17T19:13:51Z

This PR is a proof of concept for right-to-left support in core. Labels are mirrored based on the presence of strongly right-to-left characters.

Compare the Hebrew label to the text in the popover. The popover is displaying the same string but rendered by Core Text, which fully supports right-to-left text and complex text shaping.

For bilingual or diglossic labels, which are common in some countries on OpenStreetMap, runs of right-to-left text inside predominantly left-to-right text and vice versa are reversed back to logical order.

As you can see in the above screenshot, complex text shaping remains unimplemented. It’s a much larger technical challenge that affects not only layout but also font glyph selection (mapbox/DEPRECATED-mapbox-gl#4). Regardless, can a native speaker of Arabic, Persian, Urdu, etc. comment on whether mirroring alone at least improves readability compared to what we have currently? I realize that a lack of contextual forms and ligatures would continue to make labels unpresentable, but I’m hopeful we can still make small progress while we work towards full support for these languages.

In practice, mirroring alone does appear to get us complete Hebrew support in styles derived from Mapbox Streets source. Final letter forms are encoded separately in Unicode, rather like in Greek, so they’re already being displayed correctly in master. Meanwhile, niqqud are nearly nonexistent in name tags in OpenStreetMap: two short road segments and a building in all of Israel.

On this branch, strong right-to-left characters are defined as those characters that belong to the Arabic, Arabic Supplement, Arabic Extended-A, Hebrew, Syriac, and Thaana code blocks in Unicode. There is currently no support for weak directionality, so spaces and generic punctuation are treated as left-to-right text.

We could address directionality limitations by replacing my hand-rolled algorithm with minimal usage of ICU, which is available on Android, iOS, and macOS. ICU is unavailable on the Web, which is why mapbox/mapbox-gl-js#1841 would require a large dependency. Even if we can’t get intelligent directionality support into GL JS, I don’t think that should block support in the native codebase, because no one is ever going to complain that Hebrew text looks “too correct” on one platform. 😉

This is merely a proof of concept, but the following work would need to take place in order for it to land as a stopgap solution:

Exclude Eastern Arabic numerals from strong right-to-left blocks
Mirror labels that have line placement
Use ICU to determine (bi)directionality
Add rendering tests

/cc @mikemorris @kkaefer @planemad

mention-bot · 2016-08-17T19:13:53Z

@1ec5, thanks for your PR! By analyzing this pull request, we identified @ansis and @jfirebaugh to be potential reviewers.

planemad · 2016-08-17T19:30:30Z

@1ec5 ❤️ this in itself is a huge improvement and makes the labels more readily readable. Can confirm the arabic text is much easier to read. It looks like its spelled as individual alphabets instead of being written as a word.

Source: Learnt arabic in school for 6 years

mb12 · 2016-08-17T19:46:31Z

@1ec5 Can the same transformation (mirroring) be done on ft.label instead (here)? Does mirroring have to be done at the glyph level for correctness?

mikemorris · 2016-08-17T19:55:48Z

Does mirroring have to be done at the glyph level for correctness?

Yes, only the display order should be flipped - the logical order should be preserved in the underlying string, for (as a contrived off-the-top-of-my-head example) something like text-to-speech reading of labels.

mb12 · 2016-08-17T20:11:41Z

@mikemorris and @1ec5 The u32string I am referring to is a private copy of the data structure that is used for glyph placement only. It contains all the code points on which the mirroring transformation can be applied.
Applying transformation here would mean it would work for both point placement as well as line placement.

1ec5 · 2016-08-17T20:51:40Z

@mb12, thank you for the tip. Indeed, we’ll need to handle line placement as well as point placement. That was an oversight on my part. I’ve edited the original PR description to list that as a to-do item.

1ec5 · 2016-08-17T20:58:13Z

One problem with mirroring inside the string is that it’ll interfere with line wrapping. Consider the label “גן הסלעים, גן הקקטוסים והגן הגזום”, which needs to be wrapped onto multiple lines:

If we mirror the data rather than the layout, we’ll end up with a bottom-to-top, right-to-left label, which would only be appropriate for the Hanunó’o alphabet (which incidentally is bottom-to-top, left-to-right). Something like this:

הקקטוסים והגן הגזום
גן הסלעים, גן

Bidirectional text also requires multiple layout passes, one for the string overall and one for each nested text run.

mb12 · 2016-08-17T21:12:44Z

@1ec5 Would the following work for text justification?

Split the u32string into words, mirror each word and concatenate them.

1ec5 · 2016-08-17T21:22:50Z

Split the u32string into words, mirror each word and concatenate them.

Would the words be concatenated in logical or visual order? If in logical order, you’d wind up with something this like – legible but incomprehensible. If in visual order, the problem described in #6057 (comment) would remain. If labels with line placement can’t be wrapped, then yes, mirroring the string might be fine for just line placement, but we’d still need the approach I took for symbol placement.

mb12 · 2016-08-18T00:55:18Z

For RTL languages, what is the expected behavior? If the pbf file contains string "like this", does it need to be shown as "ekil siht" or does it need to be shown like "siht ekil"? Irrespective of whether this transformation happens in u32string or during glyph placement, the transform needs to ensure visual correctness.

1ec5 · 2016-08-18T02:01:04Z

If the pbf file contains string "like this", does it need to be shown as "ekil siht" or does it need to be shown like "siht ekil"?

"siht ekil". And a wrapped string that's stored in the pbf as "quite a long string" would be displayed as:

a etiuq
gnirts gnol

A bilingual string stored as "left sinister / right dexter" would be displayed as "left sinister / retxed thgir".

Other than this last case (where this branch tends to put the right-to-left run to the left of the left-to-right run) and some edge cases involving weak-directional characters like punctuation, the layout and placement of each glyph is correct on this branch.

Is there a particular reason to avoid doing this layout work during glyph placement?

kkaefer · 2016-08-18T08:43:29Z

Also note that even in this naïve approach, we need to detect direction for text runs so that we don't reverse blocks of arabic numbers.

1ec5 · 2016-08-18T09:37:10Z

Also note that even in this naïve approach, we need to detect direction for text runs so that we don't reverse blocks of arabic numbers.

Western Arabic numerals (0–9) aren’t strongly right-to-left, so they’re correctly laid out from left to right. Eastern Arabic numerals (U+0660–U+0669, U+066B, U+066C, U+06F0–U+06F9) are being detected as strongly right-to-left but shouldn’t be, since they’re stored and displayed in “big endian” order. We should exclude them from the isRTL() check.

We can make directionality detection much more robust without much trouble, either by linking ICU or by bringing in the relevant data, which shouldn’t be all that large.

mb12 · 2016-08-18T16:23:17Z

@1ec5 The only advantage of doing it pre glyph placement (if possible) would be that it would work for all the cases (both line and point placement), would be easier to debug and help avoid special isRtl cases in Glyph placement code. Plus no code modification would be needed for any optimizations (For e.g. when placement is line but alignment is viewport rectangles for all the glyphs are merged into a single rectangle).

1ec5 · 2016-08-22T05:42:46Z

I haven’t dug into why this is the case, but line-placed labels are also mirrored correctly with just the changes to mbgl::GlyphSet::lineWrap():

Reverse text that begins with a character in the Arabic, Hebrew, Syriac, or Thaana Unicode blocks. This change does not include complex text layout (bidirectional text, context-sensitive shaping, ligatures, or ordering).

1ec5 · 2016-12-01T04:36:28Z

Closing this proof-of-concept PR now that #6984 and #7123 have landed in master.

1ec5 added feature ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold text rendering Core The cross-platform C++ core, aka mbgl needs discussion labels Aug 17, 2016

1ec5 self-assigned this Aug 17, 2016

1ec5 mentioned this pull request Aug 17, 2016

Complex Text Rendering mapbox/DEPRECATED-mapbox-gl#4

Closed

1ec5 added the MapKit parity For feature parity with MapKit on iOS or macOS label Aug 22, 2016

1ec5 force-pushed the 1ec5-rtl-4 branch from eb54f75 to 55b3d92 Compare August 22, 2016 05:20

1ec5 force-pushed the 1ec5-rtl-4 branch from 9665c7a to 9d5a927 Compare October 17, 2016 08:16

1ec5 added 3 commits October 17, 2016 04:08

[core] Rudimentary right-to-left text support

4298980

Reverse text that begins with a character in the Arabic, Hebrew, Syriac, or Thaana Unicode blocks. This change does not include complex text layout (bidirectional text, context-sensitive shaping, ligatures, or ordering).

[core] Naïve bidirectional text support

ce8fccd

[core] Eastern Arabic numerals are strong LTR

d823b3e

1ec5 force-pushed the 1ec5-rtl-4 branch from 9d5a927 to d823b3e Compare October 17, 2016 11:08

1ec5 mentioned this pull request Nov 10, 2016

[core] Use ICU for bidirectional text layout and Arabic text shaping #6984

Merged

1ec5 closed this Dec 1, 2016

1ec5 deleted the 1ec5-rtl-4 branch December 1, 2016 04:36

1ec5 mentioned this pull request Mar 21, 2018

is-supported-script expression mapbox/mapbox-gl-js#6260

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Naïve right-to-left labeling #6057

[POC] Naïve right-to-left labeling #6057

1ec5 commented Aug 17, 2016 •

edited

Loading

mention-bot commented Aug 17, 2016

planemad commented Aug 17, 2016 •

edited

Loading

mb12 commented Aug 17, 2016

mikemorris commented Aug 17, 2016

mb12 commented Aug 17, 2016

1ec5 commented Aug 17, 2016 •

edited

Loading

1ec5 commented Aug 17, 2016 •

edited

Loading

mb12 commented Aug 17, 2016

1ec5 commented Aug 17, 2016 •

edited

Loading

mb12 commented Aug 18, 2016

1ec5 commented Aug 18, 2016

kkaefer commented Aug 18, 2016

1ec5 commented Aug 18, 2016 •

edited

Loading

mb12 commented Aug 18, 2016

1ec5 commented Aug 22, 2016

1ec5 commented Dec 1, 2016

[POC] Naïve right-to-left labeling #6057

[POC] Naïve right-to-left labeling #6057

Conversation

1ec5 commented Aug 17, 2016 • edited Loading

mention-bot commented Aug 17, 2016

planemad commented Aug 17, 2016 • edited Loading

mb12 commented Aug 17, 2016

mikemorris commented Aug 17, 2016

mb12 commented Aug 17, 2016

1ec5 commented Aug 17, 2016 • edited Loading

1ec5 commented Aug 17, 2016 • edited Loading

mb12 commented Aug 17, 2016

1ec5 commented Aug 17, 2016 • edited Loading

mb12 commented Aug 18, 2016

1ec5 commented Aug 18, 2016

kkaefer commented Aug 18, 2016

1ec5 commented Aug 18, 2016 • edited Loading

mb12 commented Aug 18, 2016

1ec5 commented Aug 22, 2016

1ec5 commented Dec 1, 2016

1ec5 commented Aug 17, 2016 •

edited

Loading

planemad commented Aug 17, 2016 •

edited

Loading

1ec5 commented Aug 17, 2016 •

edited

Loading

1ec5 commented Aug 17, 2016 •

edited

Loading

1ec5 commented Aug 17, 2016 •

edited

Loading

1ec5 commented Aug 18, 2016 •

edited

Loading