-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add split and join expressions #2064
Conversation
bfa2946
to
d440432
Compare
Bundle size report: Size Change: +27 B
ℹ️ View Details
|
Why are we implementing string manipulation operations in the style spec rather than creating this data up front? |
The vector data is not necessarily within the style developer's control, especially if working against a standard schema such as OpenMapTiles. Additionally, since mvt tile features are organized in key-value pairs rather than key-array pairs, it's often necessary to pack information into a feature value. We do A LOT of string manipulation in Americana to deal with language and fallback language support, and it's a big reason why our style.json is nearly 1MB. Beyond the discussion of where a particular piece of processing should go in the stack, I implemented this functionality because I thought it belonged as part of the library's general-purpose string-processing capability. It's one of the unimplemented features listed in mapbox/mapbox-gl-js#6484 and since I'd just figured out how to work with expressions, I thought I'd pitch in. I hope as I become more familiar with the code base that I might pitch in on other functionality. If my energy is misdirected, please let me know. |
I see your point. Although I don't think this will help reduce the size of the style.json. With string manipulation you can start adding a lot of functions and those needs to be supported in both web and native, and maintained, this isn't cheap (start with, contains, char at, ends with, trim, replace, is empty, sub string, and these are just from the top of my head). So, I would advise to start a discussion around which of those are absolutely necessary/high priority/commonly used and then go about implementing them. |
I agree in principle. I had expected that the sensible way to handle semicolon separated values from OSM tags would be to parse the values into an array at tile generation time so that the vector tile rendering library has a nice data structure to work with. Unfortunately, the Mapbox Vector Tile specification does not include arrays so this would not be possible (barring a new vector tile specification). This being the case, some way of handling stringified pseudo arrays is needed. For example the MVT spec suggests an array might be encoded like this:
A raw OSM tag with multiple values would be formatted like this:
I'm sure there are other possible solutions to this problem than general purpose string manipulation functions, but it definitely seems like a clear need given the lack of true arrays in the MVT specification. |
I think an expression to split strings is an interesting feature, but also I would probably use it when I do some more logic like language switching for which I would need some JavaScript. I guess this is what people call run-time styling or so.
Can you share an example? |
Yes. We intend to create a working demonstration to demonstrate the utility of split/join functionality, with a methodology as discussed in osm-americana/openstreetmap-americana#763. My plan is to use osm-americana/openstreetmap-americana#747 for style size metrics and as-yet-undetermined mechanisms for client performance profiling. Unfortunately, the main performance profiler is presently broken (#2122) but we'll cross that bridge as we get there. |
Other than replace and trim, the style specification already implements all of the functions you’ve named, or at least the building blocks to easily implement them by composing expressions.
The most relevant example is that the style replaces each semicolon in a list with a more presentable delimiter: osm-americana/openstreetmap-americana#666. Theoretically, a tileset could come with this modification already in place. However, there would be serious tradeoffs in doing so, because these are largely stylistic modifications that are not inherent to the data. For one thing, we decided that the appropriate delimiter is a newline when the symbol is point-placed but a bullet character (•) when the symbol is line-placed. This is a purely stylistic choice, apart from the need to work around mapbox/mapbox-gl-js#8575. A different style could validly choose to separate the names by dashes or slashes according to the designer’s taste, but the designer should not need to generate their own global tileset to make this modification. The style specification’s expression language supports neither string replacement (#2059) nor splitting and joining (which are building blocks for string replacement), so we implemented our own factory function for recursively generating a string replacement expression. Recursing more than a few levels deep crashes Firefox due to the sheer level of nesting in the style JSON: osm-americana/openstreetmap-americana#680. This Rube Goldberg contraption soon gave way to another. In osm-americana/openstreetmap-americana#670, we collapsed space padding after the semicolon as a compromise with the OSM community. We also deduplicated list items that happened to match the main name in the user’s preferred language (another stylistic choice). This required us to parse the property value as a list so we could compare each item to the main name. So we tossed out the replacement function and implemented a custom tokenizer to make the logic more straightforward. It surely would’ve been easier to write a tokenizer in a more complete programming language on the server side. However, the user’s preferred language comes from a user preference, which we apply using runtime styling. It would be infeasible to regenerate a new tileset in response to the user selecting a different interface language. In osm-americana/openstreetmap-americana@b83ad88...1ec5-gljs-expr-split-join-680, I prototyped what a less hacky replacement for the tokenizer would look like in Americana, measuring the overall style JSON’s size along the way. The existing tokenizer implementation doesn’t dominate the style JSON by any means, but any savings allows us to focus on things that are more relevant to the style and also makes it easier for others to develop sophisticated styles:
By the end of this experiment, what began as a browser stress test of linguistic gymnastics turned into a rather mundane string operation. After all, that was the advantage of expressions over the legacy style function syntax. |
By the way, I think |
Closing per #2059 (comment) |
This PR adds two expressions, split and joint. This helps to simplify style logic, for example osm-americana/openstreetmap-americana#680:
split
takes a string and separates it into an array, split by a delimiter character. So["split", "/", "a/b/c"]
results in["a","b","c"]
.join
combines the elements of an array together with a delimiter, so["join", ";", ["literal", ["a", "b", "c"]]]
results ina;b;c
.Launch Checklist
CHANGELOG.md
under the## main
section.