-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use TinySDF to render CJK glyphs locally #4895
Conversation
@nickidlugash I set it up so that if you run the debug server, |
This is awesome, thanks so much for taking on this! I knew this was not too far from working in GL JS but no one seemed interested much in that work at the time. Very happy to see you helping out! |
Here's a live demo if anyone wants to take a look without getting the branch running locally: https://chrisloer.github.io/tiny-sdf/index.html |
I've updated the demo with a new version that bases font weights on the font stack being used (the build is dependent on mapbox/tiny-sdf#3, so it won't work unless you link that PR locally). I spent more time digging into font metrics (aka "Caveat 1"), and I'm now much less worried about that issue. My approach was to locally modify node-fontnik to process all the glyphs in the CJK Ideographs range and print out info on all the glyphs that deviated from the baseline metrics. I did this for both As an example, consider "一" (aka "1"). In both fonts, its glyph is much shorter than the standard size. But TinySDF just draws it in the middle of a standard size glyph, and the visual effect is the same (note in the overdraw view in the right, you can see that the glyph has become bigger, but without any visual difference): I ended up hard-wiring in a I did some testing on Korean and Japanese maps. Japanese maps get a lot of the saving, and they seamlessly use the extra non-ideograph characters they need. Korean maps make heavy use of the precomposed Hangul Syllables block, so to take advantage of this change we'd have to add that block. I pushed a change to enable the block, and it seems to be working well. Feedback so far is that glyph fidelity (aka "Caveat 3") hasn't been visually noticeable. That leaves "Caveat 2" (loss of font control) as the one big remaining concern (/cc @chriswu42). Overall, this is looking pretty workable to me. |
A couple questions for folks more attuned to CJK typography than I am:
|
This is looking great! To address the caveat no.2 (loss of font control), should we simply introduce a |
src/symbol/glyph_source.js
Outdated
let remaining = 0; | ||
|
||
const getGlyph = (glyphID) => { | ||
const range = Math.floor(glyphID / 256); | ||
if (this.localIdeographFontFamily && | ||
(isChar['CJK Unified Ideographs'](glyphID) || | ||
isChar['Hangul Syllables'](glyphID))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we be able to support characters in the supplementary planes (#4001), now that we’re sourcing the glyphs from system fonts? Or are we still tied to the BMP due to how the glyphs are stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting question. On the JS side, I suspect it would mostly just work to have > maxuint16
glyph IDs. My bigger concern would be about glyph shape (especially glyph advances). Even within the BMP I've been conservative with what ranges to include -- looking for the biggest performance boost with the least risk of introducing rendering errors. Do you have a particular range you think would be a good candidate for trying out?
If we implement a FreeType analogue of this on the iOS/Android side, we'd be able to extract glyph metrics directly from local fonts, so there'd be a much more direct path to supporting the supplementary planes (note: I don't actually know how to get a "system font" in Android, but I think it should be possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any of the CJK Unified Ideographs Extension blocks would be good candidates. We don’t support them at all right now, yet they could theoretically appear in OpenStreetMap data. Overpass turbo doesn’t seem to be capable of searching for supplementary characters in character classes, but apparently CJK B characters do occur in Hong Kong place names: mapbox/DEPRECATED-mapbox-gl#29 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisLoer made a proof of concept of this:
it was pretty easy to render http://3.decodeunicode.org/en/u+2B776 just by find/replacing “charCode” with “codePoint” and using TinySDF for every glyph over 65535.
Incidentally, I did some additional research about the prevalence of affected characters in the Mapbox Streets source: #4001 (comment).
@ChrisLoer I have included below an alternative approach to addressing the Glyph loading latency problem. Can you please kindly also review this as well before coming up with a final solution? The general idea is to do incremental glyphs downloads in chunks of 256 code points on a separate thread (separate thread part is already down for native), and re-layout the Symbol buckets as new code points get loaded. Instead of loading all glyphs for a tile, we should enqueue requests for loading not more than 256 code points at a time on a separate thread. Some additional book keeping would need to be done when preparing the list of 'SymbolFeatures' to layout. Symbol buckets would also need to be laid out as glyphs are loaded incrementally. In a sense this is equivalent to generalizing how symbol buckets are created asyncrhonously. At one point in mapbox GL, all line, fill and symbol buckets had to be available before a tile was ready for rendering. This was a latency problem because symbol layouts are slower compared to lines and fills and have dependency on resources loaded over the network. This was changed to allow layouts of symbol buckets independent of lines and fills. Once this was changed, line and fills started appearing before symbols on the map. Making symbol buckets layout a function of loaded glyphs and loading them incrementally is the next logical step to address this latency problem. |
@mb12 Eventually, it's likely that we'll change the glyph request mechanism so that it's not necessary to request glyphs in 256-glyph chunks. That will dramatically reduce the amount of data we need to bring down CJK glyphs (which are sparsely distributed those chunks). Until we make a change like that, however, there's no getting around the amount of data that needs to be loaded for a Chinese-language map. Incremental loading along the lines you're suggesting might somewhat improve the time-to-first-legible-label, but fundamentally the map still wouldn't be ready until all the glyphs were downloaded. I think the visual effect of labels popping in one at a time might look weird too, although it'd be hard to know without experimenting. |
@1ec5 @mourner @chriswu42 I think it makes sense to merge these changes as they are: not enabled by default, but available as a solution for users feeling the pain from #3466. It solves a problem for them now, and gives us a way to gain broader experience with how the font loading mechanism will work across a broad range of devices and labels. That experience could help us put together a future PR along the lines of @1ec5's suggestion in #3466 (comment) to automatically deduce appropriate local fonts based on the map style and locally available browser properties. We could also do support for supplementary plane characters in a separate PR. I don't think we'll introduce too much of a backwards-compatibility concern here -- even if we do this "automatically" in the future, I don't think there'd be a problem with continuing to allow a manual override. If there are no objections to that strategy, I can merge pending a code review from @mourner. |
I agree that this should be merged. I'll review tomorrow. |
I agree as well - let's err on the side of caution for now and provide the option to turn this on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, pending minor remarks.
debug/tinysdf.html
Outdated
container: 'originalMap', | ||
zoom: 8.8, | ||
center: [121.574, 31.1489], | ||
style: '/debug/streets-zh-global-v1.json', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a hosted style and just programmatically replace the necessary layers with the ZH version so that we don't have to commit another huge file (that will eventually go out of sync with the main streets style) to the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
debug/tinysdf.html
Outdated
zoom: 8.8, | ||
center: [121.574, 31.1489], | ||
style: '/debug/streets-zh-global-v1.json', | ||
localIdeographFontFamily: '"Noto Sans", "Noto Sans CJK SC", sans-serif', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a commonly understood term, or could we also name it something like localCJKFontFamily
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@1ec5, do you have an opinion on the naming here? Here was my reasoning for changing the original "CJK" to "Ideograph":
- "CJK" is an acronym that requires some contextual awareness (although in our context it's probably not too mysterious), while "ideograph" is a plain (?) English word.
- "CJK" implies that it's based on language, when it's really based on whether fixed-sized ideographs are being displayed (for instance, the Hiragana block for Japanese isn't included, as well as latin letters, etc.). Deciding to include the precomposed 'Hangul Syllables' block made this name a bit less appropriate, because they're not really ideographs, they just behave like ideographs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either is fine; the terms are often used interchangeably.
src/symbol/glyph_source.js
Outdated
@@ -5,6 +5,9 @@ const verticalizePunctuation = require('../util/verticalize_punctuation'); | |||
const Glyphs = require('../util/glyphs'); | |||
const GlyphAtlas = require('../symbol/glyph_atlas'); | |||
const Protobuf = require('pbf'); | |||
const TinySDF = require('@mapbox/tiny-sdf'); | |||
const isChar = require('../util/is_char_in_unicode_block'); | |||
/* eslint-disable new-cap */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this disabling line is no longer necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some recent change should make this unnecessary? It's still used in script_detection
, and using the .eslintrc that's currently checked in, I still get warnings using is_char_in_unicode_block
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I mean, that line is supposed to disable the new-cap rule for the current file (which doesn't have any occurences of new lowercaseThing()
. So should this move to specific lines where it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, moved the disabling to the two lines where it's necessary. It's not actually a new lowercaseThing()
but the converse that's triggering the warning (a function starting with an uppercase letter on account of the function name being based on the Unicode block name).
fontWeight = '500'; | ||
} else if (/light/i.test(fontstack)) { | ||
fontWeight = '200'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move the fontWeight
checks up to getSimpleGlyphs
so that they're not repeated unnecessarily. Also, you could move out regexps into top-level constants (const boldRe = /bold/i
) so that they're not recreated on every check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks are only done once per unique fontstack, so probably only a handful of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Missed that.
fontWeight = '500'; | ||
} else if (/light/i.test(fontstack)) { | ||
fontWeight = '200'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Missed that.
src/symbol/glyph_source.js
Outdated
@@ -5,6 +5,9 @@ const verticalizePunctuation = require('../util/verticalize_punctuation'); | |||
const Glyphs = require('../util/glyphs'); | |||
const GlyphAtlas = require('../symbol/glyph_atlas'); | |||
const Protobuf = require('pbf'); | |||
const TinySDF = require('@mapbox/tiny-sdf'); | |||
const isChar = require('../util/is_char_in_unicode_block'); | |||
/* eslint-disable new-cap */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I mean, that line is supposed to disable the new-cap rule for the current file (which doesn't have any occurences of new lowercaseThing()
. So should this move to specific lines where it is used?
…ading for CJK Ideographs and precomposed Hangul Syllables by generating glyphs locally using TinySDF.
Distinguishes glyphs generated locally from glyphs requested from server.
…idden to display local names.
This is an experimental branch for playing around with the idea of using @mourner's TinySDF to speed up map load times for Chinese-language users (see #3466). Under our current glyph-loading architecture, Chinese-language users have to load several megabytes of SDFs from the server before they can display a map. The idea here is to give users of the SDK an option to sidestep that network cost by generating the glyphs locally using a browser canvas element (via TinySDF).
The current implementation just overrides the glyph request mechanism for glyphs in the
CJK Unified Ideographs
range, and synchronously draws the glyph with TinySDF if it's not already locally stored. Performance feels snappy to me -- there may not be any need to make it asynchronous.Caveat 1: We're guessing glyph sizes
We're counting on the fact that most ideographs have the same shape to get away with not having access to font metrics. This implementation hard-wires:
However, that's only an approximation based on the most common glyph size in
Arial Unicode MS
. In that font, 8295 characters have exactly those metrics, and the bulk of the rest are within a couple points of those metrics. This approach is tolerant of small deviations -- they make the spacing or baseline alignment slightly wrong. @nickidlugash, it'd be great to get your feedback on how the map looks.There are a few dozen glyphs in the range that definitely have different metrics. We might want to just make those ranges exceptions and fall back to asking the server. A more exact approach we might use would be to allow SDK users to provide "font metrics" for the font they think their users' browser will be using -- but that's fraught since we don't actually have control over what font the browser will use.
Caveat 2: We lose control over fonts
The current implementation just uses whatever san-serif font the browser defaults to. We could easily allow the SDK user to pass in a preferred font family, but we wouldn't have any way to guarantee that the font would be available locally. Either way, the map style designer loses control over the font used to display CJK glyphs -- opening up a lot of room for confusion.
Caveat 3: Loss in glyph fidelity
As @mourner mentioned at #3466 (comment), one other caveat is that this approach is "vector -> raster -> raster" instead of "vector -> raster", so we expect some loss of fidelity. @nickidlugash, this is another place where could use your judgment -- the labels look pretty good to me.
Trade-offs
My overall impression is that map load time in China is a significant pain point, and all these caveats may be a worth accepting in order to cut down on bandwidth requirements.
The alternative approach of modifying the server infrastructure to support per-glyph (instead of per-range) SDF requests would avoid all of these caveats and would probably not require a prohibitive amount of bandwidth -- however, the scope of that change is much broader and we're unlikely to be able to deliver it soon.