Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate names in gloss; gloss towns #670

Merged
merged 4 commits into from
Jan 7, 2023
Merged

Conversation

1ec5
Copy link
Member

@1ec5 1ec5 commented Jan 7, 2023

Replaced the general-purpose function for generating string replacement expressions with a function for generating an expression that specifically scans a string for semicolon delimiters, escaped semicolons, and space padding after a semicolon. Compared to #666, this change decreases the overall minified style JSON size1 (a rough proxy for style complexity) from 954,393 characters down to just 932,343 characters.

With this optimization in place, a city’s local-language name gloss now omits any name that matches the city’s name in the user’s preferred language to avoid repetition. This brings the size back up to 943,759 characters. Unfortunately, every multilingual place=city is currently tagged with a name that contains delimiters other than a semicolon, making this enhancement rather pointless. To make this deduplication worthwhile, this PR also enables a local-name gloss on towns too.

Previously, I mentioned in #471 (comment) that American maps and atlases only gloss major city labels, but we’ve already been glossing every city label. Spot-checking shows that label density isn’t suffering too much from the more verbose town labels, and in any case we support zooming in. The one downside to enabling glosses on town labels is that the town label layer has to repeat the massive expression that we’ve been using for cities: we’re back up to 952,745 characters – still less than we started out with, but only slightly.

Kaser and New Square

ChangeStyle JSON size (characters)
replacer954,393
− replacer + scanner−22,050
+ deduplication+11,416
+ town glosses+8,986
Net change−1,648

/ref #592 (comment) #666 (comment)

Footnotes

  1. JSON.stringify(map.getStyle()).length

Replaced the general-purpose find-and-replace expression-generating function with a special-purpose string scanner that can only look for semicolons but also accounts for escaping and space padding in the same pass.
@1ec5 1ec5 added enhancement New feature or request internationalization labels Jan 7, 2023
@1ec5 1ec5 self-assigned this Jan 7, 2023
typeof needle === "object" ? ["length", needle] : needle.length;
let needleEnd = ["+", needleStart, needleLength];
let iteration = numReplacements;
let rawSeparator = ";";
Copy link
Member Author

@1ec5 1ec5 Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spilled a lot of ink in this forum discussion about the need for a reliable delimiter – the semicolon. But as a thought experiment, what if we pretend that one of the popular ad-hoc delimiters is unambiguous enough to split on instead?

Sad reality Simulation
Alicante before Alicante after
Brussels before Brussels after
Casablanca before Casablanca after
Hong Kong before Hong Kong after

Alas, we can’t just turn every hyphen or slash or space into a value delimiter:

Sad reality Sadder simulation
Tizi Ouzou and Bordj Bou Arreridjj before Tizi Ouzou and Bordj Bou Arreridjj after
Reynoldsburg - New Albany Road before Reynoldsburg - New Albany Road after

For better or worse, places that use these other delimiters will simply have to endure suboptimal labeling.

@ZeLonewolf
Copy link
Member

Great work!

For places with multiple glosses, would it be better if each label has its own parentheses on a separate row? Just thinking that the open parenthesis on one line and the close parenthesis on the last line looks a touch uneven.

@1ec5
Copy link
Member Author

1ec5 commented Jan 7, 2023

For places with multiple glosses, would it be better if each label has its own parentheses on a separate row? Just thinking that the open parenthesis on one line and the close parenthesis on the last line looks a touch uneven.

This is probably only hypothetical for now, since so few places with long names or many local languages are using the semicolon in their names. But you’re right that there’s an imbalance. There’s not much we can do about that without GL JS support for hanging punctuation or optical margin alignment. A separate pair of parentheses on each line would look busy (really) (no, really really), and we’d still have some long names get word-wrapped onto multiple lines without any opportunities to introduce parentheses.

Brussels

The only practical way to resolve this imbalance would be to nix the parentheses and rely on font attributes to distinguish the gloss from the main text. Some maps do rely on italics or a horizontal bar to set off the gloss, but I’m not sure it would work as well in this style, especially since we allow the label to appear above the icon. If the main text and gloss have fonts that differ too much, then the main text gets visually separated from the icon, as if it’s a separate label. Moreover, italic is not a real font style in most writing systems; American print maps get away with it because they always use romanizations rather than the original writing system.

Molenbeek-Saint-Jean

src/constants/label.js Outdated Show resolved Hide resolved
@ZeLonewolf
Copy link
Member

When localized to Hebrew:
image

I assume this is a tagging error?

@1ec5
Copy link
Member Author

1ec5 commented Jan 7, 2023

According to both OSM and Wikidata, “שיכון סקווירא” is New Square’s name in Yiddish. According to Wikidata, the town has a different name in Hebrew, “ניו סקוור”, which is what you’re seeing outside the parentheses as the main text when setting the preferred language to Hebrew. #586 tracks the lack of a dedicated Yiddish name field in the vector tiles.

Copy link
Member

@ZeLonewolf ZeLonewolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants