-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Multiple text formats for symbols via "format" expression (native port) #12624
Conversation
@asheemmamoowala if you want to start looking I just got an initial port of the expression code itself in... |
Requires changing `generate-style-code` to treat 'formatted' as being the same as 'string' until gl-native gets 'formatted' support with #12624. To make nitpick happy, PropertyFunction.java uses the latest "text-field" description from v8.json. It's technically correct, just kind of pointless since the "If a plain `string` is provided" clause will always be true.
Requires changing `generate-style-code` to treat 'formatted' as being the same as 'string' until gl-native gets 'formatted' support with #12624. To make nitpick happy, PropertyFunction.java uses the latest "text-field" description from v8.json. It's technically correct, just kind of pointless since the "If a plain `string` is provided" clause will always be true.
The iOS/macOS runtime styling API will support this expression operator automatically via the ugly Since this expression operator will be used to implement bilingual or multilingual labels, #12738 tracks enhancing the iOS/macOS expression localization functionality to avoid making such labels repeat the same name multiple times. (The same considerations apply to the Mapbox GL Language plugin and the Android map SDK’s localization plugin.) /cc @nickidlugash |
ccba131
to
cdccaaf
Compare
Status update: the initial version of this is working 🎉 ! There's a lot of cleanup that remains to be done and it appears I'm occasionally doing some sort of undefined/unexpected behavior when I run the BiDi algorithm on styled text (note the random incorrect e -- I just sometimes see garbage like this show up): |
5153852
to
b036ccd
Compare
b54e51f
to
dafaa89
Compare
83ceaa4
to
41284a7
Compare
XCTAssertEqual(rawLayer->getTextField(), propertyValue, | ||
@"Setting text to a constant value expression should update text-field."); | ||
XCTAssertEqualObjects(layer.text, constantExpression, | ||
@"text should round-trip constant value expressions."); | ||
|
||
constantExpression = [NSExpression expressionWithFormat:@"'Text Field'"]; | ||
constantExpression = [NSExpression expressionWithFormat:@"MGL_FUNCTION('format', 'Text Field', %@)", @{}]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still possible to set the textField
property to a string as opposed to a formatted string? If so, we should continue to test that capability alongside the tests for formatted strings. It might require some refactoring of the EJS code that generates this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's what the first layer.text = constantExpression
on line 1100 does? (see NSExpression *constantExpression = [NSExpression expressionWithFormat:@"'Text Field'"];
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I placed my comment on the wrong diff. Specifically, I was concerned that setting the property to a plain string would fail to round trip – it would be upgraded to a formatted string. If that’s the case, it’s going to be fairly difficult for developers to work with even after we implement special format expression support. Perhaps there could be logic that downgrades a formatted string to a plain string if there’s nothing special about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm pretty unsure about all the pathways that can be followed here, but I think what you're talking about is something like this?
MGLStyleLayer *countryLayer = [self.mapView.style layerWithIdentifier:@"country-label"];
if ([countryLayer respondsToSelector:@selector(text)]) {
[countryLayer setValue:[NSExpression expressionWithFormat:@"'Fooistan'"] forKey:@"text"];
NSExpression* roundTrip = [countryLayer performSelector:@selector(text)];
NSLog(@"%@", roundTrip);
}
In which case the results of the logging is just plain "Fooistan". (Also the map looks much cleaner after the change)
This round tripping works because of the logic in toPropertyValue
:
mapbox-gl-native/platform/darwin/src/MGLStyleValue_Private.h
Lines 63 to 67 in 1ce31af
if (expression.expressionType == NSConstantValueExpressionType) { | |
MBGLType mbglValue; | |
getMBGLValue(expression.constantValue, mbglValue); | |
return mbglValue; | |
} |
Are you specifically concerned about constant strings vs. expressions that return a value of type string
? (these are the ones that will get wrapped in a coercion by the parsing, although I think this should be analogous to other implicit coercions based on property values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow I’m concerned about is existing code like this:
layer.text = [NSExpression expressionForConstantValue:@"Foo"];
// later
if ([layer.text.constantValue isEqualToString:@"Foo"]) { … } // crash
// or
if ([layer.text isEqual:[NSExpression expressionForConstantValue:@"Foo"]]) { … } // false
One always takes a risk by directly accessing NSExpression.constantValue
without first checking the expression type, but if the developer set the text
property in the first place, they may not expect to have to introduce that check when upgrading to a minor release of the SDK.
In other words, this implicit coercion is unusual in that it’s new. It’s also unusual in that formatted strings are a fairly foreign concept to iOS/macOS development, since we didn’t go the route of using NSAttributedString to represent them. Whereas coercions between strings and numbers are understood as invocations of the CAST()
operator, there’s no CAST()
operator support for converting between NSString and whatever class we’d represent a formatted string with. Introducing that CAST()
support would be an elegant way of at least explaining away the surprise that some developers might experience.
Even so, we know from the v4.0.0 release that any changes to how coercions work can catch developers by surprise. In that case, we’re still fielding questions from developers who don’t expect to have to introduce CAST()
into their inequality predicates. In this case, on the other hand, developers would be surprised at suddenly having to downcast from formatted strings to strings.
So my suggestion would be to automatically downcast from formatted strings to strings when there’s nothing but a ["formatted", "foo", {}]
. Extra CAST()
support would be cool too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the getter for any constant expression does the downcast, so the cases you provided all return true
. This is a little different from doing an automatic downcast based on whether there are options. Even if you add an explicit format expression, you'll still always get a string constant value back. Basically, format
is almost invisible from the point of view of the iOS SDK with this PR, except that you can set it via MGL_FUNCTION
.
Test code I used:
MGLSymbolStyleLayer *layer = (MGLSymbolStyleLayer*)[self.mapView.style layerWithIdentifier:@"country-label"];
layer.text = [NSExpression expressionForConstantValue:@"Foo"];
// later
if ([layer.text.constantValue isEqualToString:@"Foo"]) {
NSLog(@"constantValue comparison worked");
}
// or
if ([layer.text isEqual:[NSExpression expressionForConstantValue:@"Foo"]]) {
NSLog(@"constant expression comparison worked");
}
layer.text = [NSExpression expressionWithFormat:@"MGL_FUNCTION('format', 'Bar', %@)", @{}];
// later
if ([layer.text.constantValue isEqualToString:@"Bar"]) {
NSLog(@"constantValue bar comparison worked");
}
// or
if ([layer.text isEqual:[NSExpression expressionForConstantValue:@"Bar"]]) {
NSLog(@"constant bar expression comparison worked");
}
OK, I think this is ready for review! @1ec5, could you look at the darwin bindings? This is not meant to be a full set of bindings -- just enough to let existing usage keep working and enable use of @LukasPaczos, could you look at the Android bindings? I just hacked in an undocumented @kkaefer Can I interest you in reviewing the core part? I'm not really sure who's best qualified to dig into all the expression serialization/conversion code (which I'm most iffy about). The actual shaping code @ansis might be well placed to review, but that part's not too complicated and it's a pretty direct port from JS. |
@@ -165,7 +165,12 @@ public class <%- camelize(type) %>LayerTest extends BaseActivityTest { | |||
<% if (property.tokens) { -%> | |||
|
|||
layer.setProperties(<%- camelizeWithLeadingLowercase(property.name) %>("{token}")); | |||
<% if (property.type === 'formatted') { -%> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to re-iterate concerns from #12624 (review) for Android as well. Will plain text, or other than format
expressions accepted for text-field
property? If yes, we need to rework this .ejs
file to generate tests for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah plain text is accepted but may be transformed... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Transformed to an expression? This would change the behavior of PropertyValue#isExpression
when the text-field
property is retrieved by SymbolLayer#getTextField
which would return true
even though it should return false
for plain text. That's maybe a minor inconsistency but worth taking into consideration.
@@ -3224,6 +3224,11 @@ public static Expression collator(Expression caseSensitive, Expression diacritic | |||
return new Expression("collator", new ExpressionMap(map)); | |||
} | |||
|
|||
public static Expression format(Expression input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the plan is to introduce the full format
expression implementation in a follow-up PR, I'd rather skip the public API + tests part here entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, like instead of modifying the test to use this, just omit the test entirely for the text-field
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, and remove the public API entry from Expression
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
46aecc8
to
5712099
Compare
…multiple overloads.
Always clear errors before trying automatic coercion for a new compound expression overload.
Sounds like two votes for |
// the styling options to use for rendering that code point | ||
// The data structure is intended to accomodate the reordering/interleaving | ||
// of formatting that can happen when BiDi rearranges inputs | ||
using StyledText = std::pair<std::u16string, std::vector<uint8_t>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisLoer Since all symbol labels will be coerced to formatted text, are you concerned about memory consumption per-label increasing with each label being represented by a StyledText? The worst-case scenario is for a string to have a different style per-character, but I imagine that for the bulk of strings (those coerced) there will only be a single style.
Would it be possible to maintain the styling options in a block format instead of per-code-point? the size of the vector would equal the number of sections and each entry would be the end index of the section with the corresponding index in the sections
array.
Text | hello | hola | こんにちは |
---|---|---|---|
Section end index | 4 | 8 | 13? |
Section | 0 | 1 | 2 |
TaggedString
would need to perform a search linear in sections length for getSectionAt()
but for the most common case that shouldn't be too bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the JS side I did a little experimentation with a data structure kind of along the lines of what you're suggesting. It is a more memory-compact representation for the common case. What I was doing was iterating over the blocks as a whole, which made a lot of the iteration logic in the shaping code significantly more complicated. I think you're suggesting maintaining the same interface to TaggedString
but just calculating the section index on-the-fly, so a CPU vs. memory trade off. That seems reasonable, but I don't know which would win without actually doing the experiment.
FWIW, on the JS side I looked at overall memory consumption in a full-screen map before and after this change, and couldn't detect any difference. It should be 1 byte-per-character, and I'd guess a full screen map has maybe on the order of 10k characters worth of text?
0fc0825
to
0b1466e
Compare
OK, I think I've addressed all of the current round of review comments -- there were a lot of them, please let me know if I've left yours unresolved! @asheemmamoowala I tried doing some memory profiling, but it's pretty hard to get reproducible numbers. There's at least no huge change. I think we just have to rely on the back-of-the-envelope estimate that says it shouldn't be too bad. The thing I worry about more in this PR is that it will lengthen tile layout/preparation time because of the extra format-related work. The good news is foregound/rendering costs should be pretty much unaffected since we're not adding any new GPU data or any extra foreground CPU processing. @1ec5 Besides the existing tests, I played around a little with the @LukasPaczos I went back to the "no Android wrapper for now" approach, with the assumption that we'll want to re-visit that when it's time to do a semver-major release. The constant-value getter/setters are still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing some memory profiling, but it's pretty hard to get reproducible numbers. There's at least no huge change. I think we just have to rely on the back-of-the-envelope estimate that says it shouldn't be too bad.
@ChrisLoer this sounds good. Good call on avoiding pre-mature optimizations
The thing I worry about more in this PR is that it will lengthen tile layout/preparation time because of the extra format-related work.
I would imagine that additional time is distributed across parsing/expression-creation, shaping, and memory allocation. You could add short-cut code paths in the shaping and bidi code to handle single section strings, but I doubt that it helps much by itself - while increasing code size.
I'm working on android bindings in parallel in #12985 and I think I'm able to exhaust all the paths and get expected results. All in all, LGTM 👍 |
Do these changes account for the exception when using |
I've looked at that issue, but I don't understand how to trigger the problem (I'm sure you said how to do it, I just don't understand the vocabulary around NSExpression). FWIW, in terms of that issue, I'd expect the behavior here to be the same as the behavior for |
@ChrisLoer this is an example of what makes
|
@@ -28,6 +28,13 @@ struct Converter<PropertyValue<T>> { | |||
? PropertyValue<T>(PropertyExpression<T>(convertTokenStringToExpression(t))) | |||
: PropertyValue<T>(t); | |||
} | |||
|
|||
PropertyValue<T> maybeConvertTokens(const expression::Formatted& t) const { | |||
std::string todoText = t.sections[0].text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a const std::string&
to avoid the copy.
sed -i.bak 's/((index)|((int32_t)(level)<<31))/((index)|((uint32_t)(level)<<31))/' "src/ubidiimp.h" | ||
sed -i.bak 's/((x)|=((int32_t)(level)<<31)/((x)|=((uint32_t)(level)<<31)/' "src/ubidiimp.h" | ||
rm -rf src/ubidiimp.h.bak | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this works, I think that using an actual patch conveys the intention of the command better. Additionally, we'll be alerted if the upstream file changes in an incompatible way when we're updating the ICU version.
patch -p0 << PATCH
--- src/ubidiimp.h
+++ src/ubidiimp.h
@@ -198,8 +198,8 @@
/* in a Run, logicalStart will get this bit set if the run level is odd */
#define INDEX_ODD_BIT (1UL<<31)
-#define MAKE_INDEX_ODD_PAIR(index, level) ((index)|((int32_t)(level)<<31))
+#define MAKE_INDEX_ODD_PAIR(index, level) ((index)|((uint32_t)(level)<<31))
-#define ADD_ODD_BIT_FROM_LEVEL(x, level) ((x)|=((int32_t)(level)<<31))
+#define ADD_ODD_BIT_FROM_LEVEL(x, level) ((x)|=((uint32_t)(level)<<31))
#define REMOVE_ODD_BIT(x) ((x)&=~INDEX_ODD_BIT)
#define GET_INDEX(x) ((x)&~INDEX_ODD_BIT)
PATCH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, definitely better!
@@ -100,6 +100,108 @@ std::vector<std::u16string> BiDi::processText(const std::u16string& input, | |||
|
|||
return applyLineBreaking(lineBreakPoints); | |||
} | |||
|
|||
std::vector<StyledText> BiDi::processStyledText(const StyledText& input, std::set<std::size_t> lineBreakPoints) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this fix looks good; let's apply it until there's an upstream fix.
} | ||
|
||
void TaggedString::verticalizePunctuation() { | ||
// Relies on verticalization changing characters in place so that style indices don't need updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to rely on, but we could add a comment to
mapbox-gl-native/src/mbgl/util/i18n.cpp
Lines 303 to 321 in d21c126
const std::map<char16_t, char16_t> verticalPunctuation = { | |
{ u'!', u'︕' }, { u'#', u'#' }, { u'$', u'$' }, { u'%', u'%' }, { u'&', u'&' }, | |
{ u'(', u'︵' }, { u')', u'︶' }, { u'*', u'*' }, { u'+', u'+' }, { u',', u'︐' }, | |
{ u'-', u'︲' }, { u'.', u'・' }, { u'/', u'/' }, { u':', u'︓' }, { u';', u'︔' }, | |
{ u'<', u'︿' }, { u'=', u'=' }, { u'>', u'﹀' }, { u'?', u'︖' }, { u'@', u'@' }, | |
{ u'[', u'﹇' }, { u'\\', u'\' }, { u']', u'﹈' }, { u'^', u'^' }, { u'_', u'︳' }, | |
{ u'`', u'`' }, { u'{', u'︷' }, { u'|', u'―' }, { u'}', u'︸' }, { u'~', u'~' }, | |
{ u'¢', u'¢' }, { u'£', u'£' }, { u'¥', u'¥' }, { u'¦', u'¦' }, { u'¬', u'¬' }, | |
{ u'¯', u' ̄' }, { u'–', u'︲' }, { u'—', u'︱' }, { u'‘', u'﹃' }, { u'’', u'﹄' }, | |
{ u'“', u'﹁' }, { u'”', u'﹂' }, { u'…', u'︙' }, { u'‧', u'・' }, { u'₩', u'₩' }, | |
{ u'、', u'︑' }, { u'。', u'︒' }, { u'〈', u'︿' }, { u'〉', u'﹀' }, { u'《', u'︽' }, | |
{ u'》', u'︾' }, { u'「', u'﹁' }, { u'」', u'﹂' }, { u'『', u'﹃' }, { u'』', u'﹄' }, | |
{ u'【', u'︻' }, { u'】', u'︼' }, { u'〔', u'︹' }, { u'〕', u'︺' }, { u'〖', u'︗' }, | |
{ u'〗', u'︘' }, { u'!', u'︕' }, { u'(', u'︵' }, { u')', u'︶' }, { u',', u'︐' }, | |
{ u'-', u'︲' }, { u'.', u'・' }, { u':', u'︓' }, { u';', u'︔' }, { u'<', u'︿' }, | |
{ u'>', u'﹀' }, { u'?', u'︖' }, { u'[', u'﹇' }, { u']', u'﹈' }, { u'_', u'︳' }, | |
{ u'{', u'︷' }, { u'|', u'―' }, { u'}', u'︸' }, { u'⦅', u'︵' }, { u'⦆', u'︶' }, | |
{ u'。', u'︒' }, { u'「', u'﹁' }, { u'」', u'﹂' }, | |
}; |
@@ -55,17 +55,21 @@ class Glyph { | |||
}; | |||
|
|||
using Glyphs = std::map<GlyphID, optional<Immutable<Glyph>>>; | |||
using GlyphMap = std::map<FontStack, Glyphs>; | |||
using GlyphMap = std::map<FontStackHash, Glyphs>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why change this to a hash instead of using std::unordered_map
with a custom Hash
template argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we look up the glyphs in the shaping code, all we have is the FontStackHash
(this is to keep PositionedGlyphs
and TaggedString
relatively compact). Is there a way to do a find-by-hash on an unordered_map
with a custom hasher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah hm, maybe we could store an iterator instead? Iterators in unordered_map
aren't invalidated, except when the item is removed. I was just noticing that there are a lot of changes where we explicitly invoke the Hasher which feels a bit awkward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true, the explicit hashing does feel awkward. I think the part that feels worst is when we have to do the reverse from hash->[string] for dependency lookup (i.e. GeometryTileWorker::onGlyphsAvailable
).
I'm interested in the "store an iterator" idea, but I'm concerned that it will make the code harder to audit because you have to make sure that if you store an iterator you're using the same map when you try to use the iterator later -- and GeometryTileWorker already merges GlyphMaps from multiple requests to GlyphManager over time...
b62a115
to
b032b38
Compare
Add support for `MGL_FUNCTION('format', <text>, <options dictionary>)`
- No dedicated support for creating format expressions - Java accessors for 'text-field' flatten back to String - 'text-field' setter implicitly creates a 'format' expression. For tests, use JsonArray to build an equivalent format expression by hand.
- Port of arabic.test.js from mapbox-gl-rtl-text - Modify BiDi::getLine to remove trailing nulls in the event UBIDI_REMOVE_BIDI_CONTROLS causes the string to shorten. - Patch vendored ICU to avoid undefined undefined bit shifting behavior (triggered sanitizer failure)
b032b38
to
fc71883
Compare
Port of mapbox/mapbox-gl-js#6994.
format
expression itself, but it was part of the JS PR and we want to keep native in sync)format
expression parsing codecollator
,format
uses the new object/dictionary argument format e.g.{ "font-scale": 1.5, "text-font": "awesome" }
. I think it should be just fine to defer binding work to later PRs.