Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is-supported-script expression #6260

Merged
merged 5 commits into from
Apr 18, 2018
Merged

is-supported-script expression #6260

merged 5 commits into from
Apr 18, 2018

Conversation

ChrisLoer
Copy link
Contributor

@ChrisLoer ChrisLoer commented Mar 1, 2018

This is a potential partial fix to address #5807. The idea is to expose an expression that can do a reasonable job of evaluating whether a given string will render "correctly", so that style authors can express ideas like "show local language label if possible, otherwise fall back to transliterated or other value".

Determining what we can render correctly is necessarily guesswork -- even if we had access to the font metadata (which we don't), we'd have to make individual judgment calls about what constituted "semantically significant" shaping (e.g. even in latin scripts, we can't render ligatures, but we don't consider it broken). There are cases where "legibility" may very greatly between fonts: for example, in Arial Unicode, Lao diacritic characters don't have built-in negative metrics, so without shaping support they'll float in empty space, but in Noto Lao diacritics have negative metrics that make them appear in mostly the right place.

I wasn't expecting it to be so complicated to pass a single boolean value into the evaluation context for expressions, but I'm hoping this PR hooks up the plumbing (via EvaluationParameters) to make it straightforward to add more parameters in the future.

TODO:

  • Implement tests
  • Performance evaluation (currently benchmark code doesn't use EvaluationParameters, so it won't show any overhead there is in creating that object vs. using a bare GlobalProperties). Checking the plugin availability should be cheap...
  • Figure out right name, agonize over adding hard-to-define heuristic behavior to style spec

/cc @jfirebaugh @anandthakker @1ec5 @nickidlugash @lucaswoj

constructor(zoom: number, options?: *) {
this.zoom = zoom;
this.isRenderable = isRenderable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRenderable can just be defined directly as a method on EvaluationParameters (instead of declaring the isRenderable: property above and assigning to this.isRenderable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed.

value.expression.evaluate({zoom: parameters.zoom + 1.0}),
value.expression.evaluate(new EvaluationParameters(parameters.zoom - 1.0, parameters)),
value.expression.evaluate(parameters),
value.expression.evaluate(new EvaluationParameters(parameters.zoom + 1.0, parameters)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second argument to EvaluationParameters here (and two lines above) isn't strictly necessary, because expression evaluation doesn't make use of any properties of EvaluationParameters other than zoom.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then again, if this fact were to change in the future, omitting the argument would cause a bug wherein whatever new important property would get stripped. 👍 LGTM

@ChrisLoer
Copy link
Contributor Author

Copy link
Contributor

@anandthakker anandthakker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisLoer out of curiosity (and for future reference), how did you decide on the renderable / non-renderable content in the test fixtures?

@ChrisLoer
Copy link
Contributor Author

The "Salaam 39" Arabic text is a nice little test string because it mixes RTL and LTR (for the numerals), and testing that we report "true" for Arabic exercises the part of the logic that checks whether the plugin is loaded (in the test suite, the rtl-text-plugin is always loaded, although it's loaded a little differently from live maps). As for the word, well who doesn't like "peace"?

For the non-renderable, I just grabbed a string that I believe says "devanagari" in devanagari script. Nothing special, I just chose something from the ranges I decided to mark "not renderable".

@ChrisLoer
Copy link
Contributor Author

I renamed is-renderable to is-supported-script after discussion with @lucaswoj @1ec5 @anandthakker and @jfirebaugh.

@jfirebaugh is concerned that "supported script" is not descriptive enough, and proposes as an alternative script-rendering-fidelity which returns an enum (full or none). He also thinks using an enum may allow extensibility in the future. I played around with that syntax in a few expressions, and it felt cumbersome to me -- and also like it over-specified (i.e. "full rendering fidelity" isn't quite accurate for plenty of scripts that we'd still say we "support"). Also, I don't see a clear extensibility use case -- would adding an intermediate state like partial actually be useful for style designers?

In a harfbuzz/complex text shaping future, I think the equivalent to this operator would want to take a fontstack as an argument so it could tell you if the fonts themselves had the information necessary to render the string.

@nickidlugash
Copy link

@ChrisLoer thanks for working on this 😻

I'm hoping this PR hooks up the plumbing (via EvaluationParameters) to make it straightforward to add more parameters in the future.

Can you clarify what kinds of additional parameters could be added?

The char ranges excluded in charInRenderableScript don't seem to be inclusive of all scripts that need complex shaping, e.g. Khmer. Should we exlcude all the blocks that are part of scripts that have Shaping required: YES here? http://www.unicode.org/repos/cldr/trunk/common/properties/scriptMetadata.txt

(Some scripts have a MIN value for shaping, including Latin and Thai – there may be exceptions but we should not exclude these from being rendered for now, I think).

I'm working right now on reviewing certain scripts (like Thai and Lao) that are most relevant to our core style needs, to get a better practical understanding of how legible these scripts are to native reader (taking into account, as you pointed out, the possible variation caused by different fonts). Would we feel comfortable manually adjusting the char ranges in charInRenderableScript based on these reviews?

would adding an intermediate state like partial actually be useful for style designers?

Perhaps we could use something like partial for scripts that can be rendered correctly/almost correctly in certain fonts? It would be up to users discretion whether they think their font can render these correctly. For our core styles, I think we would strive to support this subset of scripts.

@ChrisLoer
Copy link
Contributor Author

Can you clarify what kinds of additional parameters could be added?

I wasn't thinking of anything specifically related to this functionality. But before this change there wasn't really a way to connect an expression operator to shared code in the rest of the map.

Perhaps we could use something like partial for scripts that can be rendered correctly/almost correctly in certain fonts?

🤔 So maybe I should wait for you to finish reviewing Thai/Lao/etc to see if we classify them as "partial"? If you think we have a clear use for it, I'm happy to go that route. Can I outsource the entire set of classifications to you? I think starting from CLDR like you suggested makes sense -- ideally we can condense that into just a handful of broad ranges that require shaping. For the RTL languages (which are all marked Shaping: YES), we'd need a case-by-case judgment (I know Hebrew and Arabic are "good enough", but actually don't know how legible Syriac is for instance)

@nickidlugash
Copy link

Can I outsource the entire set of classifications to you?

Sure, I can take a stab.

For clarification, if this operator was an enum rather than a boolean, what would the most efficient syntax look like? Would you need to wrap the operator in an equality operator?

Perhaps we could use something like partial for scripts that can be rendered correctly/almost correctly in certain fonts?

Alternatively, we could also consider just including those scripts as is-supported-script: true if we want to stick to a boolean. If we can create font stacks to correctly display these scripts in our core styles, then we'll also be able to provide users with style guidelines and fonts to do the same.

@1ec5
Copy link
Contributor

1ec5 commented Mar 21, 2018

I know Hebrew and Arabic are "good enough", but actually don't know how legible Syriac is for instance

These expressions would be used most commonly on map data, so it might be possible to simplify the question of renderability based on OpenStreetMap tagging practices. For example, in mapbox/mapbox-gl-native#6057, I found that a lack of support for Hebrew niqqud marks would’ve been no problem because it would’ve affected only a handful of street-level features.

@nickidlugash
Copy link

@ChrisLoer per chat, I've updated the unicode ranges to flag, though they are still subject to slight adjustments, since I'm still doing some legibility checks with native readers.

Regarding whether it would be useful to have partial or other more descriptive values, I'm not sure, but I think perhaps a boolean is fine. 🤔 From the standpoint of our default map styling, we only need two conditions: Display the text if the script is "supported", and hide the text if the script is "unsupported" (adding more conditionals could be helpful but would probably not be worth the added complexity to the expression values for text-field assignment. They're already going to be pretty complex).

If we only have boolean values in this operator, then I presume we'll make the judgement call of where that somewhat fuzzy line is based on what we've deemed best for our default styles. If we have more nuanced values, customers would have more flexibility (both with adjusting the display in our default styles, and with custom styles/data), but my sense is that there is not a demand for this. We've always baked certain language display decisions in our vector tiles and default styles, and so far we haven't received any feedback to indicate that customers want the degree of language display control enabled by this operator. I think it will mainly be used in our default styles.

@ChrisLoer ChrisLoer changed the title Implement is-renderable expression is-supported-script expression Mar 28, 2018
@ChrisLoer
Copy link
Contributor Author

Thanks for those changes @nickidlugash! Sorry the rebase-on-top-of-rollup got a bit messy and I ended up squashing/re-writing you out of the commit history.

@jfirebaugh or @anandthakker, can I get another round of review? @jfirebaugh I know I never convinced you on is-supported-script vs script-rendering-fidelity but from further discussion with @nickidlugash I think a plain boolean is appropriate here. If it helps, I'm happy to grant "I told you so" rights for when this decision comes back to bite us. 😜

[StringType],
// At parse time this will always return true, so we need to exclude this expression with isGlobalPropertyConstant
(ctx, [s]) => {
if (ctx.globals && ctx.globals.isSupportedScript) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const isSupportedScript = ctx.globals && ctx.globals.isSupportedScript;
if (isSupportedScript) {
    return isSupportedScript(s.evaluate(ctx));
}

@@ -263,6 +263,44 @@ export function charHasNeutralVerticalOrientation(char: number) {
* @private
*/
export function charHasRotatedVerticalOrientation(char: number) {
return !(charHasUprightVerticalOrientation(char) ||
charHasNeutralVerticalOrientation(char));
return !(exports.charHasUprightVerticalOrientation(char) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exports./ (x3)

Did this accidentally work because it gets transpiled to something where exports.charInSupportedScript actually works? Or are the tests incomplete?

Is there a lint we can turn on to flag the use of exports/module.exports in ES6 modules? (cc @anandthakker)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! 😅 I just ran the render tests in the debugger and can verify the functions are getting called. There's an "exports" object which appears to contain the exported functions:

 265 function charHasRotatedVerticalOrientation(char        ) {
>266     debugger;
 267     return !(exports.charHasUprightVerticalOrientation(char) ||
 268              exports.charHasNeutralVerticalOrientation(char));
debug> repl
Press Ctrl + C to leave debug repl
> exports
{ allowsIdeographicBreaking: undefined,
  allowsVerticalWritingMode: undefined,
  allowsLetterSpacing: undefined,
  charAllowsLetterSpacing: undefined,
  charAllowsIdeographicBreaking: undefined,
  ... }
> exports.charHasUprightVerticalOrientation
[Function: charHasUprightVerticalOrientation]
> exports.charHasUprightVerticalOrientation(char)
true
> exports.charHasNeutralVerticalOrientation(char)
false
> exports.charInSupportedScript(char)
true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point. We should set env.node to false in our root .eslintrc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickidlugash
Copy link

@ChrisLoer I made an update to the unicode ranges in this check, based on native reader reviews. I'm still working on some reviews but I feel pretty confident about our current ranges.

ChrisLoer and others added 5 commits April 13, 2018 16:37
- Pass completion callback back to foreground so it can answer whether plugin has loaded
- Add `isLoaded()` method that can be checked from foreground or background
- Serialize error message w/ `toString()` before passing to callback, since we don't have a serialization for `Error#message`
Heuristically evaluates string as "likely to render correctly" based on an audit of current rendering support.
The expression determines if a string is expected to render legibly, based on the Unicode blocks used by the string and the availability of the RTLTextPlugin.
This commit standardizes 'EvaluationParameters` as the way to provide global context (in this case, the 'isSupportedString' function) to expression evaluation.
@ChrisLoer
Copy link
Contributor Author

@anandthakker I know you already approved this once, but it ended up getting touched a fair bit after your review. Would you mind taking another quick look before I merge?

@@ -132,20 +132,20 @@ export default class Worker {
this.self.importScripts(params.url);
callback();
} catch (e) {
callback(e);
callback(e.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this e.toString() because Errors don't serialize properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I wrote this, but yeah, what I remember is that some Errors that get generated deep in the importScripts code don't serialize.

}
callback(globalRTLTextPlugin.isLoaded() ?
null :
new Error(`RTL Text Plugin failed to import scripts from ${pluginURL}`));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing e.toString() below, should we also be doing so here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but web_worker_transfer's serialize works just fine on an Error constructed out of a plain string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants