Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TinySDF to render CJK glyphs locally #4895

Merged
merged 5 commits into from
Jul 5, 2017
Merged

Use TinySDF to render CJK glyphs locally #4895

merged 5 commits into from
Jul 5, 2017

Conversation

ChrisLoer
Copy link
Contributor

This is an experimental branch for playing around with the idea of using @mourner's TinySDF to speed up map load times for Chinese-language users (see #3466). Under our current glyph-loading architecture, Chinese-language users have to load several megabytes of SDFs from the server before they can display a map. The idea here is to give users of the SDK an option to sidestep that network cost by generating the glyphs locally using a browser canvas element (via TinySDF).

The current implementation just overrides the glyph request mechanism for glyphs in the CJK Unified Ideographs range, and synchronously draws the glyph with TinySDF if it's not already locally stored. Performance feels snappy to me -- there may not be any need to make it asynchronous.

Caveat 1: We're guessing glyph sizes

We're counting on the fact that most ideographs have the same shape to get away with not having access to font metrics. This implementation hard-wires:

        width: 22,
        height: 22,
        left: 1,
        top: -6,
        advance: 24

However, that's only an approximation based on the most common glyph size in Arial Unicode MS. In that font, 8295 characters have exactly those metrics, and the bulk of the rest are within a couple points of those metrics. This approach is tolerant of small deviations -- they make the spacing or baseline alignment slightly wrong. @nickidlugash, it'd be great to get your feedback on how the map looks.

There are a few dozen glyphs in the range that definitely have different metrics. We might want to just make those ranges exceptions and fall back to asking the server. A more exact approach we might use would be to allow SDK users to provide "font metrics" for the font they think their users' browser will be using -- but that's fraught since we don't actually have control over what font the browser will use.

Caveat 2: We lose control over fonts

The current implementation just uses whatever san-serif font the browser defaults to. We could easily allow the SDK user to pass in a preferred font family, but we wouldn't have any way to guarantee that the font would be available locally. Either way, the map style designer loses control over the font used to display CJK glyphs -- opening up a lot of room for confusion.

Caveat 3: Loss in glyph fidelity

As @mourner mentioned at #3466 (comment), one other caveat is that this approach is "vector -> raster -> raster" instead of "vector -> raster", so we expect some loss of fidelity. @nickidlugash, this is another place where could use your judgment -- the labels look pretty good to me.

Trade-offs

My overall impression is that map load time in China is a significant pain point, and all these caveats may be a worth accepting in order to cut down on bandwidth requirements.

The alternative approach of modifying the server infrastructure to support per-glyph (instead of per-range) SDF requests would avoid all of these caveats and would probably not require a prohibitive amount of bandwidth -- however, the scope of that change is much broader and we're unlikely to be able to deliver it soon.

@ChrisLoer
Copy link
Contributor Author

@nickidlugash I set it up so that if you run the debug server, debug/debug.html will show local (i.e. Chinese) labels.

@mourner
Copy link
Member

mourner commented Jun 26, 2017

This is awesome, thanks so much for taking on this! I knew this was not too far from working in GL JS but no one seemed interested much in that work at the time. Very happy to see you helping out!

@ChrisLoer
Copy link
Contributor Author

Here's a live demo if anyone wants to take a look without getting the branch running locally: https://chrisloer.github.io/tiny-sdf/index.html

@ChrisLoer
Copy link
Contributor Author

ChrisLoer commented Jun 28, 2017

I've updated the demo with a new version that bases font weights on the font stack being used (the build is dependent on mapbox/tiny-sdf#3, so it won't work unless you link that PR locally).

I spent more time digging into font metrics (aka "Caveat 1"), and I'm now much less worried about that issue. My approach was to locally modify node-fontnik to process all the glyphs in the CJK Ideographs range and print out info on all the glyphs that deviated from the baseline metrics. I did this for both Arial Unicode MS and Noto Sans CJK SC. advance is the most important metric, and I only found a handful of cases of it being off by one pixel. There are thousands of deviations from the baseline width and height, but the important test is that the change in width/left and height/top keeps the entirety of the glyph within the 24-by-24 box TinySDF is drawing in. They all pass that test: so what happens is that their original offsets get "baked in" to the bitmap produced by TinySDF.

As an example, consider "一" (aka "1"). In both fonts, its glyph is much shorter than the standard size. But TinySDF just draws it in the middle of a standard size glyph, and the visual effect is the same (note in the overdraw view in the right, you can see that the glyph has become bigger, but without any visual difference):

screenshot 2017-06-28 11 57 46

screenshot 2017-06-28 11 57 55

I ended up hard-wiring in a top value of -8, which corresponds to the baseline of Noto Sans CJK SC. This is to get closer to pixel-perfect alignment for 'Noto Sans', and means we're a little bit off for 'Arial Unicode', which would have a value of -6. There's a long-standing (#191!) issue that we ignore baselines: but the visual effect is pretty minor. If we wanted to make sure baselines agreed between the "TinySDF" and the "Original" versions of the map, we could pass a baseline in as an argument, but I doubt it's worth it.

I did some testing on Korean and Japanese maps. Japanese maps get a lot of the saving, and they seamlessly use the extra non-ideograph characters they need. Korean maps make heavy use of the precomposed Hangul Syllables block, so to take advantage of this change we'd have to add that block. I pushed a change to enable the block, and it seems to be working well.

Feedback so far is that glyph fidelity (aka "Caveat 3") hasn't been visually noticeable. That leaves "Caveat 2" (loss of font control) as the one big remaining concern (/cc @chriswu42).

Overall, this is looking pretty workable to me.

@1ec5
Copy link
Contributor

1ec5 commented Jun 28, 2017

A couple questions for folks more attuned to CJK typography than I am:

  • Do you perceive a loss in crispness in large, bold labels (such as labels of countries or major cities)? I’d imagine styles designed specifically for CJK might tend to have larger labels in general.
  • Where the map uses Traditional Chinese characters, are there any common characters that become ambiguous?

@mourner
Copy link
Member

mourner commented Jun 29, 2017

This is looking great! To address the caveat no.2 (loss of font control), should we simply introduce a Map option such as useSystemCJKFont which defaults to true?

@ChrisLoer ChrisLoer changed the title Proof of concept: use TinySDF to render CJK glyphs locally [DO NOT MERGE] Use TinySDF to render CJK glyphs locally [DO NOT MERGE] Jun 29, 2017
let remaining = 0;

const getGlyph = (glyphID) => {
const range = Math.floor(glyphID / 256);
if (this.localIdeographFontFamily &&
(isChar['CJK Unified Ideographs'](glyphID) ||
isChar['Hangul Syllables'](glyphID))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to support characters in the supplementary planes (#4001), now that we’re sourcing the glyphs from system fonts? Or are we still tied to the BMP due to how the glyphs are stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting question. On the JS side, I suspect it would mostly just work to have > maxuint16 glyph IDs. My bigger concern would be about glyph shape (especially glyph advances). Even within the BMP I've been conservative with what ranges to include -- looking for the biggest performance boost with the least risk of introducing rendering errors. Do you have a particular range you think would be a good candidate for trying out?

If we implement a FreeType analogue of this on the iOS/Android side, we'd be able to extract glyph metrics directly from local fonts, so there'd be a much more direct path to supporting the supplementary planes (note: I don't actually know how to get a "system font" in Android, but I think it should be possible).

Copy link
Contributor

@1ec5 1ec5 Jun 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any of the CJK Unified Ideographs Extension blocks would be good candidates. We don’t support them at all right now, yet they could theoretically appear in OpenStreetMap data. Overpass turbo doesn’t seem to be capable of searching for supplementary characters in character classes, but apparently CJK B characters do occur in Hong Kong place names: mapbox/DEPRECATED-mapbox-gl#29 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisLoer made a proof of concept of this:

screenshot 2017-06-30 14 55 25

it was pretty easy to render http://3.decodeunicode.org/en/u+2B776 just by find/replacing “charCode” with “codePoint” and using TinySDF for every glyph over 65535.

Incidentally, I did some additional research about the prevalence of affected characters in the Mapbox Streets source: #4001 (comment).

@mb12
Copy link

mb12 commented Jul 2, 2017

@ChrisLoer I have included below an alternative approach to addressing the Glyph loading latency problem. Can you please kindly also review this as well before coming up with a final solution?

The general idea is to do incremental glyphs downloads in chunks of 256 code points on a separate thread (separate thread part is already down for native), and re-layout the Symbol buckets as new code points get loaded.

Instead of loading all glyphs for a tile, we should enqueue requests for loading not more than 256 code points at a time on a separate thread. Some additional book keeping would need to be done when preparing the list of 'SymbolFeatures' to layout. Symbol buckets would also need to be laid out as glyphs are loaded incrementally.

In a sense this is equivalent to generalizing how symbol buckets are created asyncrhonously. At one point in mapbox GL, all line, fill and symbol buckets had to be available before a tile was ready for rendering. This was a latency problem because symbol layouts are slower compared to lines and fills and have dependency on resources loaded over the network. This was changed to allow layouts of symbol buckets independent of lines and fills. Once this was changed, line and fills started appearing before symbols on the map. Making symbol buckets layout a function of loaded glyphs and loading them incrementally is the next logical step to address this latency problem.

@ChrisLoer
Copy link
Contributor Author

@mb12 Eventually, it's likely that we'll change the glyph request mechanism so that it's not necessary to request glyphs in 256-glyph chunks. That will dramatically reduce the amount of data we need to bring down CJK glyphs (which are sparsely distributed those chunks). Until we make a change like that, however, there's no getting around the amount of data that needs to be loaded for a Chinese-language map. Incremental loading along the lines you're suggesting might somewhat improve the time-to-first-legible-label, but fundamentally the map still wouldn't be ready until all the glyphs were downloaded. I think the visual effect of labels popping in one at a time might look weird too, although it'd be hard to know without experimenting.

@ChrisLoer ChrisLoer requested a review from mourner July 3, 2017 18:52
@ChrisLoer ChrisLoer changed the title Use TinySDF to render CJK glyphs locally [DO NOT MERGE] Use TinySDF to render CJK glyphs locally Jul 3, 2017
@ChrisLoer
Copy link
Contributor Author

@1ec5 @mourner @chriswu42 I think it makes sense to merge these changes as they are: not enabled by default, but available as a solution for users feeling the pain from #3466. It solves a problem for them now, and gives us a way to gain broader experience with how the font loading mechanism will work across a broad range of devices and labels. That experience could help us put together a future PR along the lines of @1ec5's suggestion in #3466 (comment) to automatically deduce appropriate local fonts based on the map style and locally available browser properties. We could also do support for supplementary plane characters in a separate PR. I don't think we'll introduce too much of a backwards-compatibility concern here -- even if we do this "automatically" in the future, I don't think there'd be a problem with continuing to allow a manual override.

If there are no objections to that strategy, I can merge pending a code review from @mourner.

@mourner
Copy link
Member

mourner commented Jul 3, 2017

I agree that this should be merged. I'll review tomorrow.

@chriswu42
Copy link

I agree as well - let's err on the side of caution for now and provide the option to turn this on.

Copy link
Member

@mourner mourner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, pending minor remarks.

container: 'originalMap',
zoom: 8.8,
center: [121.574, 31.1489],
style: '/debug/streets-zh-global-v1.json',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a hosted style and just programmatically replace the necessary layers with the ZH version so that we don't have to commit another huge file (that will eventually go out of sync with the main streets style) to the repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

zoom: 8.8,
center: [121.574, 31.1489],
style: '/debug/streets-zh-global-v1.json',
localIdeographFontFamily: '"Noto Sans", "Noto Sans CJK SC", sans-serif',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a commonly understood term, or could we also name it something like localCJKFontFamily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1ec5, do you have an opinion on the naming here? Here was my reasoning for changing the original "CJK" to "Ideograph":

  • "CJK" is an acronym that requires some contextual awareness (although in our context it's probably not too mysterious), while "ideograph" is a plain (?) English word.
  • "CJK" implies that it's based on language, when it's really based on whether fixed-sized ideographs are being displayed (for instance, the Hiragana block for Japanese isn't included, as well as latin letters, etc.). Deciding to include the precomposed 'Hangul Syllables' block made this name a bit less appropriate, because they're not really ideographs, they just behave like ideographs.

Copy link
Contributor

@1ec5 1ec5 Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either is fine; the terms are often used interchangeably.

@@ -5,6 +5,9 @@ const verticalizePunctuation = require('../util/verticalize_punctuation');
const Glyphs = require('../util/glyphs');
const GlyphAtlas = require('../symbol/glyph_atlas');
const Protobuf = require('pbf');
const TinySDF = require('@mapbox/tiny-sdf');
const isChar = require('../util/is_char_in_unicode_block');
/* eslint-disable new-cap */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this disabling line is no longer necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some recent change should make this unnecessary? It's still used in script_detection, and using the .eslintrc that's currently checked in, I still get warnings using is_char_in_unicode_block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I mean, that line is supposed to disable the new-cap rule for the current file (which doesn't have any occurences of new lowercaseThing(). So should this move to specific lines where it is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, moved the disabling to the two lines where it's necessary. It's not actually a new lowercaseThing() but the converse that's triggering the warning (a function starting with an uppercase letter on account of the function name being based on the Unicode block name).

fontWeight = '500';
} else if (/light/i.test(fontstack)) {
fontWeight = '200';
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the fontWeight checks up to getSimpleGlyphs so that they're not repeated unnecessarily. Also, you could move out regexps into top-level constants (const boldRe = /bold/i) so that they're not recreated on every check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks are only done once per unique fontstack, so probably only a handful of times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. Missed that.

fontWeight = '500';
} else if (/light/i.test(fontstack)) {
fontWeight = '200';
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. Missed that.

@@ -5,6 +5,9 @@ const verticalizePunctuation = require('../util/verticalize_punctuation');
const Glyphs = require('../util/glyphs');
const GlyphAtlas = require('../symbol/glyph_atlas');
const Protobuf = require('pbf');
const TinySDF = require('@mapbox/tiny-sdf');
const isChar = require('../util/is_char_in_unicode_block');
/* eslint-disable new-cap */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I mean, that line is supposed to disable the new-cap rule for the current file (which doesn't have any occurences of new lowercaseThing(). So should this move to specific lines where it is used?

ChrisLoer added 5 commits July 4, 2017 10:59
…ading for CJK Ideographs and precomposed Hangul Syllables by generating glyphs locally using TinySDF.
Distinguishes glyphs generated locally from glyphs requested from server.
@ChrisLoer ChrisLoer merged commit eba9de7 into master Jul 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants