Gloss city names in the local language #592

1ec5 · 2022-11-29T20:25:14Z

If the name of a city in the local language differs from its name in the user-preferred language (#578), the label now includes a gloss on a second line containing the local name in smaller type and in parentheses. The names are compared case-insensitively but without diacritic folding based on the user-preferred language’s collation rules.

The choice of cities, as opposed to other kinds of places, is mainly based on prior art, as seen in #471 (comment). This approach minimizes clutter in countries that use a different writing system than the user does, important because we don’t currently have reliable access to a transliterated name when a translated name is available in a given language. We could tighten up the filter further, so that only cities above a certain rank get a gloss, but I think the reason for glossing a label would become less intuitive to users.

English

Spanish

Korean

Arabic

Esperanto

This map has reached Peak Internationalization:

🤖

The gloss makes it easy to spot typos, vandalism, and labels needing import cleanup in Wikidata:

Under the hood, this PR replaces the metadata-based approach to localizing text-field in #581 with a top-level expression variable that can be replaced safely each time the language preference changes. This is a purer style specification–based solution that’s less likely to force a full style reload when the language preference changes. The code can easily be extended to vary arbitrary layout and paint properties and even filters based on any number of environment variables, not just the language preference. For example, if we label elevations in the future, we could dynamically update them to reflect changes in the preferred measurement system.

Fixes #471.

src/layer/place.js

src/americana.js

src/layer/place.js

src/americana.js

1ec5 · 2022-11-30T05:12:42Z

src/layer/place.js

-  metadata: {
-    "americana:text-field-localized": true,
-  },


So far, I’ve limited the glosses to cities to avoid clutter at mid zoom levels, leaving other places labeled by the name in the preferred language only. This is already more generous than American print atlases, which limit the glosses to “world-class” cities with well-known English names. However, we could explore factoring out the giant expression below and reusing it on the place=town and place=village layers at much higher zoom levels. It might require some fine-tuning to maintain the desired label density. The Wikidata items on these smaller settlements are less likely to have gotten cleaned up since an import, so this would be a nice way to surface data in need of attention. I’m thinking we could save that for tail work once we’ve proven out glosses in the city layer.

1ec5

I think this PR is getting close in terms of the visual output. However, I still need to factor out some common parts of the massive text-field expression for readability. I may also look into adding unit tests, either in this PR or separately, because there are a lot of edge cases to juggle.

src/americana.js

src/layer/place.js

ZeLonewolf · 2022-11-30T15:43:38Z

I noticed this oddity in Italy:

1ec5 · 2022-11-30T18:09:38Z

I noticed this oddity in Italy:

Someone changed the English and Italian labels on Wikidata to include a curly apostrophe instead of a straight one. The OSM node has a straight apostrophe. One way to look at it is that the gloss is really good at catching things like this. 😅

Wikidata’s label guidelines are silent about which kind of punctuation to use in labels. In some subject areas, Wikidata has bots going around switching straight apostrophes to curly apostrophes, but otherwise, most labels are based on Wikipedia article titles. The English Wikipedia’s house style, which has been influential in other Wikipedia language editions, insists on straight quotation marks. However, some language editions like the Russian Wikipedia standardize on curly apostrophes instead: osmlab/name-suggestion-index#2969 (comment). So far, I haven’t come across any discrepancies in Russian-speaking countries when setting language=ru.

In OSM, mappers have disagreed on whether to use straight apostrophes,¹ forcing data consumers like Mapbox Streets to replace straight apostrophes with curly apostrophes in postprocessing. Unfortunately, any postprocessing is potentially unreliable without certainty about which language is stored in name. A while back, @claysmalley caused a stir by correcting the punctuation on Amtrak stations, apparently because people were concerned about breaking Nominatim and bespoke Overpass queries. I personally prefer curly apostrophes and en dashes, but you’ll only find them on unbranded POIs, because the name-suggestion-index has standardized on straight apostrophes and hyphens.

Unfortunately, GL JS doesn’t support the ignorePunctuation collator option in collator option objects, so there’s no way to account for these discrepancies on the client side.

On the bright side, in Hawaiʻi, we did replace GNIS- and TIGER-imported straight apostrophes with proper ʻokina and that has been uncontroversial so far. ↩

ZeLonewolf · 2022-11-30T18:11:04Z

One way to look at it is that the gloss is really good at catching things like this.

Good explanation. I'm cool with that :)

Overall I'm pretty happy with how this looks!

jleedev · 2022-12-01T00:39:20Z

The English Wikipedia’s house style […] insists on straight quotation marks.

They also think en dashes in URLs is a cute idea, so I take them with a grain of salt.

1ec5 · 2022-12-01T20:59:40Z

I may also look into adding unit tests, either in this PR or separately, because there are a lot of edge cases to juggle.

Once a test harness lands in #594, we can refactor the ginormous city name label text-field expression to be testable and add test cases for all the cities we examined above in code review.

1ec5

This feature is now ready for review.

1ec5 · 2022-12-03T12:51:50Z

test/spec/label.js

+      expectGloss("en", "Montreal", "Montréal", "Montréal", null);
+      expectGloss("en", "Quebec City", "Québec", "Québec City", null);
+      expectGloss("en", "Da Nang", "Đà Nẵng", "Đà Nẵng", null);
+      expectGloss("en", "Nūll Island", "Ñüłl Íşlåńđ", "Ñüłl Íşlåńđ", null);


I’ve implemented a compact syntax for testing that, given a name in the preferred language and a name in the local language both in the properties of a feature, the massive expression will evaluate to a format expression with the expected primary label and gloss. In fact, just about any expression in this style can now be tested in this manner, independently of the style, by providing the necessary bits of data.

zekefarwell

This looks good to me and ready to merge.

Here are some comments for future improvement, but as discussed at length on this PR, we don't have the neccessary data in the tiles to do anything about them right now.

It's a bit cluttered having a gloss on every city name in areas where local name uses a different writing system than the user's language. Hopefully we can find a way to use transliterated names in the future.

The diacritic folding when the language is set to English cuts down on redundancy very well. It would be nice if we could do the same for other languages, but I understand it's difficult since diacritics are much more important in languages other than English. I could be wrong, but I imagine a Spanish speaker would find this kind of thing somewhat silly looking:

Seems like this is the best we can do with the available data, but maybe a better solution will be possible with changes to the tiles.

St. vs Saint is an issue. I hope this can be resolved on the data side by standardizing on one form. It sure would feel dumb to have to add a special case to treat them as equivalent for gloss purposes.

1ec5 · 2022-12-05T06:26:23Z

The diacritic folding when the language is set to English cuts down on redundancy very well. It would be nice if we could do the same for other languages, but I understand it's difficult since diacritics are much more important in languages other than English. I could be wrong, but I imagine a Spanish speaker would find this kind of thing somewhat silly looking:

This is specifically an issue when a Spanish speaker looks at an English-speaking country, as opposed to say a French-speaking country as in #592 (comment). Similarly, there was a request in OSMUS Slack to enable diacritic folding in Polish too, but it turned out to be counterproductive: Latvian diacritics overriding Polish diacritics on one city label (Jurmała → Jūrmala) right next to another label where the two names remained separate due to a base letter difference (Ryga + Rīga, Jełgawa + Jelgava).

What makes your Spanish example particularly ironic is that the English names are derived from Spanish to begin with, but we don’t have etymological data at our disposal. Perhaps we could conflate more aggressively if we had an expression operator that could search for a character class or regular expression. Then we could check if the local name matches name:en and is composed of only ASCII characters.

St. vs Saint is an issue. I hope this can be resolved on the data side by standardizing on one form. It sure would feel dumb to have to add a special case to treat them as equivalent for gloss purposes.

This is a consequence of OpenStreetMap’s policy of expanding abbreviations in names,¹ even when a word would typically be abbreviated in prose. These particular cases will be fixed by onthegomap/planetiler#403. However, there may be features in OSM that inconsistently expand the word in name but abbreviate it in name:en.

American English considers “St.” to be an abbreviation, whereas evidently British English does not. However, “St.” can be a non-abbreviation in American English too: the family name St. Clair comes from Sinclair, so it’s incorrect to spell it out as “Saint Clair”. ↩

Replaced the metadata-based approach to localizing text-field with a top-level expression variable that can be replaced safely each time the language preference changes. This approach can easily be extended to support additional variables that depend on the environment.

When the name in the preferred language matches the name in the local language except for the addition of a prefix or suffix, which is likely to be an insignificant word like “City”, splice the local-language name into the preferred-language name to reduce redundancy.

1ec5 added enhancement New feature or request internationalization labels Nov 29, 2022

1ec5 self-assigned this Nov 29, 2022

1ec5 commented Nov 29, 2022

View reviewed changes

src/layer/place.js Outdated Show resolved Hide resolved

src/americana.js Outdated Show resolved Hide resolved

src/layer/place.js Outdated Show resolved Hide resolved

src/layer/place.js Outdated Show resolved Hide resolved

src/americana.js Outdated Show resolved Hide resolved

1ec5 commented Nov 30, 2022

View reviewed changes

src/americana.js Outdated Show resolved Hide resolved

src/layer/place.js Outdated Show resolved Hide resolved

src/layer/place.js Outdated Show resolved Hide resolved

1ec5 force-pushed the 1ec5-place-gloss-471 branch from a62d7d1 to 8d4f092 Compare December 1, 2022 01:18

1ec5 mentioned this pull request Dec 1, 2022

Add unit tests #594

Merged

1ec5 force-pushed the 1ec5-place-gloss-471 branch 2 times, most recently from 5d76fd3 to 090b407 Compare December 3, 2022 12:43

1ec5 marked this pull request as ready for review December 3, 2022 12:43

1ec5 commented Dec 3, 2022

View reviewed changes

1ec5 force-pushed the 1ec5-place-gloss-471 branch from 090b407 to 16e7d5c Compare December 4, 2022 06:17

zekefarwell approved these changes Dec 5, 2022

View reviewed changes

1ec5 added 10 commits December 5, 2022 09:26

Gloss city names in local language

7cec299

Isolate glossed name from parentheses

68eb253

Fold diacritics in English to determine gloss visibility

98e9a41

Simplified label expression generation

64e6131

Refactored layer localization

d62da7e

Install style specification

cc8b28d

Test place label with gloss

c32b11a

Refactored glossy city label expression

8025a7d

1ec5 force-pushed the 1ec5-place-gloss-471 branch from fd7b02e to 8025a7d Compare December 5, 2022 17:27

1ec5 merged commit f9b08d0 into main Dec 5, 2022

1ec5 deleted the 1ec5-place-gloss-471 branch December 5, 2022 17:27

This was referenced Dec 10, 2022

Render CJK glyphs locally #613

Closed

OpenHistoricalMap font stack doesn't support comma-separated lists #612

Closed

This was referenced Jan 4, 2023

Pretty-print semicolon delimiters in compound names #665

Closed

Deduplicate names in gloss; gloss towns #670

Merged

1ec5 mentioned this pull request Jan 30, 2023

Document accessibility goals and features #743

Open

1ec5 mentioned this pull request Jul 31, 2023

Shield Libary: generate three output formats for broad compatibility #905

Draft

This was referenced Aug 12, 2023

Package label localization as a plugin #914

Open

Render names according to the language the user is using? OpenHistoricalMap/issues#481

Open

This was referenced Oct 2, 2023

import-wikidata should prefer name statements over labels openmaptiles/openmaptiles-tools#437

Open

Prefer name statements over Wikidata labels onthegomap/planetiler#679

Open

1ec5 mentioned this pull request Apr 30, 2024

Append secondary name glosses to labels depending on context #1053

Open

1ec5 mentioned this pull request Aug 21, 2024

Preview of MapLibre text rendering overhaul #1149

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gloss city names in the local language #592

Gloss city names in the local language #592

1ec5 commented Nov 29, 2022 •

edited by ZeLonewolf

Loading

1ec5 Nov 30, 2022 •

edited

Loading

1ec5 left a comment

ZeLonewolf commented Nov 30, 2022

1ec5 commented Nov 30, 2022 •

edited

Loading

ZeLonewolf commented Nov 30, 2022

jleedev commented Dec 1, 2022

1ec5 commented Dec 1, 2022

1ec5 left a comment

1ec5 Dec 3, 2022

zekefarwell left a comment

1ec5 commented Dec 5, 2022 •

edited by ZeLonewolf

Loading

Gloss city names in the local language #592

Gloss city names in the local language #592

Conversation

1ec5 commented Nov 29, 2022 • edited by ZeLonewolf Loading

English

Spanish

Korean

Arabic

Esperanto

🤖

1ec5 Nov 30, 2022 • edited Loading

Choose a reason for hiding this comment

1ec5 left a comment

Choose a reason for hiding this comment

ZeLonewolf commented Nov 30, 2022

1ec5 commented Nov 30, 2022 • edited Loading

Footnotes

ZeLonewolf commented Nov 30, 2022

jleedev commented Dec 1, 2022

1ec5 commented Dec 1, 2022

1ec5 left a comment

Choose a reason for hiding this comment

1ec5 Dec 3, 2022

Choose a reason for hiding this comment

zekefarwell left a comment

Choose a reason for hiding this comment

1ec5 commented Dec 5, 2022 • edited by ZeLonewolf Loading

Footnotes

1ec5 commented Nov 29, 2022 •

edited by ZeLonewolf

Loading

1ec5 Nov 30, 2022 •

edited

Loading

1ec5 commented Nov 30, 2022 •

edited

Loading

1ec5 commented Dec 5, 2022 •

edited by ZeLonewolf

Loading