Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[POC] Naïve right-to-left labeling #6057

Closed
wants to merge 3 commits into from
Closed

[POC] Naïve right-to-left labeling #6057

wants to merge 3 commits into from

Conversation

1ec5
Copy link
Contributor

@1ec5 1ec5 commented Aug 17, 2016

This PR is a proof of concept for right-to-left support in core. Labels are mirrored based on the presence of strongly right-to-left characters.

tel-aviv
    Compare the Hebrew label to the text in the popover. The popover is displaying the same string but rendered by Core Text, which fully supports right-to-left text and complex text shaping.

For bilingual or diglossic labels, which are common in some countries on OpenStreetMap, runs of right-to-left text inside predominantly left-to-right text and vice versa are reversed back to logical order.

karachi

As you can see in the above screenshot, complex text shaping remains unimplemented. It’s a much larger technical challenge that affects not only layout but also font glyph selection (mapbox/DEPRECATED-mapbox-gl#4). Regardless, can a native speaker of Arabic, Persian, Urdu, etc. comment on whether mirroring alone at least improves readability compared to what we have currently? I realize that a lack of contextual forms and ligatures would continue to make labels unpresentable, but I’m hopeful we can still make small progress while we work towards full support for these languages.

In practice, mirroring alone does appear to get us complete Hebrew support in styles derived from Mapbox Streets source. Final letter forms are encoded separately in Unicode, rather like in Greek, so they’re already being displayed correctly in master. Meanwhile, niqqud are nearly nonexistent in name tags in OpenStreetMap: two short road segments and a building in all of Israel.

On this branch, strong right-to-left characters are defined as those characters that belong to the Arabic, Arabic Supplement, Arabic Extended-A, Hebrew, Syriac, and Thaana code blocks in Unicode. There is currently no support for weak directionality, so spaces and generic punctuation are treated as left-to-right text.

We could address directionality limitations by replacing my hand-rolled algorithm with minimal usage of ICU, which is available on Android, iOS, and macOS. ICU is unavailable on the Web, which is why mapbox/mapbox-gl-js#1841 would require a large dependency. Even if we can’t get intelligent directionality support into GL JS, I don’t think that should block support in the native codebase, because no one is ever going to complain that Hebrew text looks “too correct” on one platform. 😉

This is merely a proof of concept, but the following work would need to take place in order for it to land as a stopgap solution:

  • Exclude Eastern Arabic numerals from strong right-to-left blocks
  • Mirror labels that have line placement
  • Use ICU to determine (bi)directionality
  • Add rendering tests

/cc @mikemorris @kkaefer @planemad

@1ec5 1ec5 added feature ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold text rendering Core The cross-platform C++ core, aka mbgl needs discussion labels Aug 17, 2016
@1ec5 1ec5 self-assigned this Aug 17, 2016
@mention-bot
Copy link

@1ec5, thanks for your PR! By analyzing this pull request, we identified @ansis and @jfirebaugh to be potential reviewers.

@planemad
Copy link

planemad commented Aug 17, 2016

@1ec5 ❤️ this in itself is a huge improvement and makes the labels more readily readable. Can confirm the arabic text is much easier to read. It looks like its spelled as individual alphabets instead of being written as a word.

Source: Learnt arabic in school for 6 years

@mb12
Copy link

mb12 commented Aug 17, 2016

@1ec5 Can the same transformation (mirroring) be done on ft.label instead (here)? Does mirroring have to be done at the glyph level for correctness?

@mikemorris
Copy link
Contributor

Does mirroring have to be done at the glyph level for correctness?

Yes, only the display order should be flipped - the logical order should be preserved in the underlying string, for (as a contrived off-the-top-of-my-head example) something like text-to-speech reading of labels.

@mb12
Copy link

mb12 commented Aug 17, 2016

@mikemorris and @1ec5 The u32string I am referring to is a private copy of the data structure that is used for glyph placement only. It contains all the code points on which the mirroring transformation can be applied.
Applying transformation here would mean it would work for both point placement as well as line placement.

@1ec5
Copy link
Contributor Author

1ec5 commented Aug 17, 2016

@mb12, thank you for the tip. Indeed, we’ll need to handle line placement as well as point placement. That was an oversight on my part. I’ve edited the original PR description to list that as a to-do item.

@1ec5
Copy link
Contributor Author

1ec5 commented Aug 17, 2016

One problem with mirroring inside the string is that it’ll interfere with line wrapping. Consider the label “גן הסלעים, גן הקקטוסים והגן הגזום”, which needs to be wrapped onto multiple lines:

Sla’im Garden

If we mirror the data rather than the layout, we’ll end up with a bottom-to-top, right-to-left label, which would only be appropriate for the Hanunó’o alphabet (which incidentally is bottom-to-top, left-to-right). Something like this:

הקקטוסים והגן הגזום
גן הסלעים, גן

Bidirectional text also requires multiple layout passes, one for the string overall and one for each nested text run.

@mb12
Copy link

mb12 commented Aug 17, 2016

@1ec5 Would the following work for text justification?

Split the u32string into words, mirror each word and concatenate them.

@1ec5
Copy link
Contributor Author

1ec5 commented Aug 17, 2016

Split the u32string into words, mirror each word and concatenate them.

Would the words be concatenated in logical or visual order? If in logical order, you’d wind up with something this like – legible but incomprehensible. If in visual order, the problem described in #6057 (comment) would remain. If labels with line placement can’t be wrapped, then yes, mirroring the string might be fine for just line placement, but we’d still need the approach I took for symbol placement.

@mb12
Copy link

mb12 commented Aug 18, 2016

For RTL languages, what is the expected behavior? If the pbf file contains string "like this", does it need to be shown as "ekil siht" or does it need to be shown like "siht ekil"? Irrespective of whether this transformation happens in u32string or during glyph placement, the transform needs to ensure visual correctness.

@1ec5
Copy link
Contributor Author

1ec5 commented Aug 18, 2016

If the pbf file contains string "like this", does it need to be shown as "ekil siht" or does it need to be shown like "siht ekil"?

"siht ekil". And a wrapped string that's stored in the pbf as "quite a long string" would be displayed as:

a etiuq
gnirts gnol

A bilingual string stored as "left sinister / right dexter" would be displayed as "left sinister / retxed thgir".

Other than this last case (where this branch tends to put the right-to-left run to the left of the left-to-right run) and some edge cases involving weak-directional characters like punctuation, the layout and placement of each glyph is correct on this branch.

Is there a particular reason to avoid doing this layout work during glyph placement?

@kkaefer
Copy link
Contributor

kkaefer commented Aug 18, 2016

Also note that even in this naïve approach, we need to detect direction for text runs so that we don't reverse blocks of arabic numbers.

@1ec5
Copy link
Contributor Author

1ec5 commented Aug 18, 2016

Also note that even in this naïve approach, we need to detect direction for text runs so that we don't reverse blocks of arabic numbers.

Western Arabic numerals (0–9) aren’t strongly right-to-left, so they’re correctly laid out from left to right. Eastern Arabic numerals (U+0660–U+0669, U+066B, U+066C, U+06F0–U+06F9) are being detected as strongly right-to-left but shouldn’t be, since they’re stored and displayed in “big endian” order. We should exclude them from the isRTL() check.

We can make directionality detection much more robust without much trouble, either by linking ICU or by bringing in the relevant data, which shouldn’t be all that large.

@mb12
Copy link

mb12 commented Aug 18, 2016

@1ec5 The only advantage of doing it pre glyph placement (if possible) would be that it would work for all the cases (both line and point placement), would be easier to debug and help avoid special isRtl cases in Glyph placement code. Plus no code modification would be needed for any optimizations (For e.g. when placement is line but alignment is viewport rectangles for all the glyphs are merged into a single rectangle).

@1ec5 1ec5 added the MapKit parity For feature parity with MapKit on iOS or macOS label Aug 22, 2016
@1ec5
Copy link
Contributor Author

1ec5 commented Aug 22, 2016

I haven’t dug into why this is the case, but line-placed labels are also mirrored correctly with just the changes to mbgl::GlyphSet::lineWrap():

haneviim

Reverse text that begins with a character in the Arabic, Hebrew, Syriac, or Thaana Unicode blocks. This change does not include complex text layout (bidirectional text, context-sensitive shaping, ligatures, or ordering).
@1ec5
Copy link
Contributor Author

1ec5 commented Dec 1, 2016

Closing this proof-of-concept PR now that #6984 and #7123 have landed in master.

@1ec5 1ec5 closed this Dec 1, 2016
@1ec5 1ec5 deleted the 1ec5-rtl-4 branch December 1, 2016 04:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl ⚠️ DO NOT MERGE Work in progress, proof of concept, or on hold feature MapKit parity For feature parity with MapKit on iOS or macOS needs discussion text rendering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants