-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format semicolon-delimited value lists in labels #666
Conversation
src/constants/label.js
Outdated
/** | ||
* Maximum number of values in a semicolon-delimited list of values. | ||
* | ||
* Increasing this constant deepens recursion for replacing delimiters in the | ||
* list, potentially affecting style loading performance. | ||
*/ | ||
const maxValueListLength = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please compare this branch to main and note any perceptible degradation in style loading speed, tile loading speed, panning and zooming performance, or memory or CPU usage. If it’s severe enough, we can tune this limit downward until we’re satisfied with the balance between handling enough features and maintaining reasonable performance.
// Replace the ;; escape sequence with a placeholder sequence unlikely to | ||
// legitimately occur inside a value or separator. | ||
const objReplacementChar = "\x91\ufffc\x92"; // https://overpass-turbo.eu/s/1pJx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s amazing how much junk there is in name
that can impersonate a strip marker, but this sequence does the trick for now. If someone opens a pub for text processing geeks and names it “The �� & ;”, I’ll kindly ask someone to offer up a prized password as sacrifice.
@@ -217,7 +338,7 @@ export const localizedNameWithLocalGloss = [ | |||
["var", "localizedCollator"], | |||
], | |||
// ...just pick one. | |||
["var", "localizedName"], | |||
["format", listValuesExpression(["var", "localizedName"], "\n")], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of this method is so complex that the style-spec package (used by the unit tests) is apparently unable to upcast it to a format
expression, leading to an error when it sees the direct format
expression in the else slot below. Interestingly, GL JS has no such problem at runtime. This explicit cast reassures the expression parser that each branch of the case
expression will resolve to a format
expression.
if (typeof evaluated === "string") { | ||
return [evaluated, null]; | ||
return [evaluated]; | ||
} | ||
return [evaluated.sections[0].text, evaluated.sections[4].text]; | ||
return [evaluated.sections[0].text, evaluated.sections[4]?.text]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result of the manual cast above, this expression no longer returns a literal string, always a format
object.
Test location below. Performance measurement in Chrome on my laptop (multiple refreshes based on network tab in the Chrome debugger console Approximately 37% slowdown. Current/New: Cursory testing seems to indicate that these numbers are independent of location (high zoom on an ocean tile yields the same results). I'm curious to understand if the performance difference is due to execution or building the style.json. |
Yes, the expression is only evaluated at runtime when loading a tile or reevaluating a symbol layer, so the impact is uniform everywhere. That said, it would be possible to limit this semicolon replacement to certain layers that are more likely to have semicolons in them. Or we could create a separate copy of each layer prefiltered to just the features that contain semicolons in them, but at the cost of additional layers that participate in symbol collision.
The new code probably isn’t a significant contributor to the time it takes to build the JSON. (It happens in memory; there’s no file involved.) However, the step right after, loading the JSON, requires more memory than before. Under the hood, GL JS compiles the expression to a type-safe, object-oriented data structure, which becomes more expensive as the expression becomes more deeply nested. Whenever a new tile needs to be displayed, a Web Worker needs to evaluate more massive expressions. The With a built-in string replacement operator, the performance concern would essentially go away. The operator’s performance would be similar to the |
Pending anyone else's review of the code changes, I think it's a good idea to proceed with the proposed change to handle semicolon-delimited lists. I'm concerned about the 1/3 performance drop for processing up to 9 semi-colons, though I understand that future improvements to MapLibre may make it possible for use to be more efficient about this. Perhaps we could reduce the 9 down to a number that captures the vast majority of the sane delimiter use cases while ignoring the silly ones that are probably tagging errors. |
I believe that’s a tagging error. In fact, editors such as iD automatically remove the space when you touch the name. |
The wiki mentions a single space after a semicolon being valid: I'm sure handling either |
Editors strip out the space in general, but preserve it for certain syntaxes like |
Well iD does anyway. I didn't have any problem including a space here with JOSM: https://www.openstreetmap.org/way/971815925 |
For place nodes, stats on prevalence:
|
For reference, the semicolon delimiter was first documented (without a space) in 2007, when it replaced a recommendation to use a comma.1 There continued to be some uncertainty about the delimiter until 2008.
You can do anything you want in JOSM. 😉 But it too prefers a semicolon whenever it constructs a value list for you. The main reason space-padded semicolons appear in the database today is that Potlatch used to insert this delimiter when merging two ways. It was usually a mistake caused by its confusing shortcut for merging ways. Based on that behavior, mappers (including yours truly) used to manually insert the same delimiter for consistency, but it has been rare for a long time: openstreetmap/iD#941. Documentation about space padding is pretty stale, dating back to when the same page pointed out that characters other than a semicolon were being used as value delimiters. This page is the product of past edit-warring and also flamewarring on the tagging list, so people have probably been hesitant to do much cleanup on it since. (The controversy was not about the choice of delimiter but rather about multiple-value lists in the first place.) I updated the documentation to reflect longstanding reality. If there’s a strong need to support the space after the semicolon, then there will be an additional performance hit as long as we build upon this magnificent workaround for the lack of a string replacement operator. This extra padding is more noticeable with the inline treatment than with the newline treatment. We could replace the bullet with a series of spaces, which would make the imbalance imperceptible, but at the cost of making it even more likely for a dual-named label to get collided out. Given this complexity, I’d suggest tracking support for space-padded semicolons as tail work, which I would consider blocked by a string replacement operator. I’m not opposed to it in principle, since a value’s leading and trailing spaces are always extraneous, but the cost-benefit ratio isn’t superb.
Thanks, this argues strongly in favor of setting the limit to just one or two semicolons for now. I suppose it would be difficult to gather similar statistics about ad-hoc delimiters that would be candidates for conversion to semicolons – which is the point of using semicolons in the first place. But out of curiosity, is there a similar distribution on places if we naïvely assume all space-padded slashes and hyphens are intended to be value delimiters? Footnotes
|
Indeed 💪💪
Sounds good to me. Even without explicitly handling the extra space after a semicolon, the bullet delimiter is far superior to to displaying the raw value with the semicolon. I don't really find the extra space on one side of the bullet all that noticeable. I just didn't think we ought to necessarily consider a space after a semicolon to be a tagging error. At least not while the raw, semicolon delimited values are displayed on the osm.org standard layer. |
Query for all types of objects this time.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I did some performance profiling and there's definitely an increase to JS execution time and jank on initial load and while panning/zooming around the map. However, performance is already not great and without profiling I'm not sure the increase from this changes is really all that noticeable (at least on my hardware). Given the stats on numbers of semicolons it seems like dropping maxValueListLength
to 4 would be perfectly reasonable if that would improve performance.
Added a function that returns an expression to find and replace a fixed number of occurrences of a substring. Added a function to format a list of semicolon-delimited tag values. Separate multiple names by newlines or bullets as necessary.
2091677
to
cf6c351
Compare
@@ -175,7 +175,7 @@ export function replaceExpression( | |||
* Increasing this constant deepens recursion for replacing delimiters in the | |||
* list, potentially affecting style loading performance. | |||
*/ | |||
const maxValueListLength = 9; | |||
const maxValueListLength = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three values separated by two semicolons is plenty enough for the name
s currently using semicolons. In the future, with a more efficient string replacement implementation, we can add support for more semicolons, which will make this option more attractive to mappers currently using informal delimiters like spaces.
#670 collapses up to one space following a semicolon. If a mapper does intentionally use |
Whenever a feature’s name contains multiple values separated by semicolons, the label now puts each value on a separate line or separates them with an elegant bullet, for up to nine values, honoring the
;;
escape sequence.This implementation includes a general-purpose polyfill for a string replacement expression operator. We can use it for any purpose in an expression or filter but need to be careful about avoiding a combinatoric explosion in the size of an expression’s code. I’ve included a comprehensive battery of unit tests.
Kaser and New Square, New York
Machahe/Caiyuan/Liucandong/Houyukou/Wangshilou/Lichahe/Chahexin/Fukangxin/Qianyukou villages in Liangshan County, Jining City, Shandong, China
Viet Heritage Garden, San José, California
Columbus Cincinnati Road/Cincinnati–Columbus Road in West Chester Township, Butler County, Ohio
Southbound Interstate 275 at exits 63A/B in Union Township, Clermont County, Ohio
Fixes #665.