-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gloss city names in the local language #592
Conversation
metadata: { | ||
"americana:text-field-localized": true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, I’ve limited the glosses to cities to avoid clutter at mid zoom levels, leaving other places labeled by the name in the preferred language only. This is already more generous than American print atlases, which limit the glosses to “world-class” cities with well-known English names. However, we could explore factoring out the giant expression below and reusing it on the place=town
and place=village
layers at much higher zoom levels. It might require some fine-tuning to maintain the desired label density. The Wikidata items on these smaller settlements are less likely to have gotten cleaned up since an import, so this would be a nice way to surface data in need of attention. I’m thinking we could save that for tail work once we’ve proven out glosses in the city layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is getting close in terms of the visual output. However, I still need to factor out some common parts of the massive text-field
expression for readability. I may also look into adding unit tests, either in this PR or separately, because there are a lot of edge cases to juggle.
Someone changed the English and Italian labels on Wikidata to include a curly apostrophe instead of a straight one. The OSM node has a straight apostrophe. One way to look at it is that the gloss is really good at catching things like this. 😅 Wikidata’s label guidelines are silent about which kind of punctuation to use in labels. In some subject areas, Wikidata has bots going around switching straight apostrophes to curly apostrophes, but otherwise, most labels are based on Wikipedia article titles. The English Wikipedia’s house style, which has been influential in other Wikipedia language editions, insists on straight quotation marks. However, some language editions like the Russian Wikipedia standardize on curly apostrophes instead: osmlab/name-suggestion-index#2969 (comment). So far, I haven’t come across any discrepancies in Russian-speaking countries when setting In OSM, mappers have disagreed on whether to use straight apostrophes,1 forcing data consumers like Mapbox Streets to replace straight apostrophes with curly apostrophes in postprocessing. Unfortunately, any postprocessing is potentially unreliable without certainty about which language is stored in Unfortunately, GL JS doesn’t support the Footnotes
|
Good explanation. I'm cool with that :) Overall I'm pretty happy with how this looks! |
They also think en dashes in URLs is a cute idea, so I take them with a grain of salt. |
a62d7d1
to
8d4f092
Compare
Once a test harness lands in #594, we can refactor the ginormous city name label |
5d76fd3
to
090b407
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is now ready for review.
expectGloss("en", "Montreal", "Montréal", "Montréal", null); | ||
expectGloss("en", "Quebec City", "Québec", "Québec City", null); | ||
expectGloss("en", "Da Nang", "Đà Nẵng", "Đà Nẵng", null); | ||
expectGloss("en", "Nūll Island", "Ñüłl Íşlåńđ", "Ñüłl Íşlåńđ", null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve implemented a compact syntax for testing that, given a name in the preferred language and a name in the local language both in the properties of a feature, the massive expression will evaluate to a format
expression with the expected primary label and gloss. In fact, just about any expression in this style can now be tested in this manner, independently of the style, by providing the necessary bits of data.
090b407
to
16e7d5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and ready to merge.
Here are some comments for future improvement, but as discussed at length on this PR, we don't have the neccessary data in the tiles to do anything about them right now.
It's a bit cluttered having a gloss on every city name in areas where local name uses a different writing system than the user's language. Hopefully we can find a way to use transliterated names in the future.
The diacritic folding when the language is set to English cuts down on redundancy very well. It would be nice if we could do the same for other languages, but I understand it's difficult since diacritics are much more important in languages other than English. I could be wrong, but I imagine a Spanish speaker would find this kind of thing somewhat silly looking:
Seems like this is the best we can do with the available data, but maybe a better solution will be possible with changes to the tiles.
St. vs Saint is an issue. I hope this can be resolved on the data side by standardizing on one form. It sure would feel dumb to have to add a special case to treat them as equivalent for gloss purposes.
This is specifically an issue when a Spanish speaker looks at an English-speaking country, as opposed to say a French-speaking country as in #592 (comment). Similarly, there was a request in OSMUS Slack to enable diacritic folding in Polish too, but it turned out to be counterproductive: Latvian diacritics overriding Polish diacritics on one city label (Jurmała → Jūrmala) right next to another label where the two names remained separate due to a base letter difference (Ryga + Rīga, Jełgawa + Jelgava). What makes your Spanish example particularly ironic is that the English names are derived from Spanish to begin with, but we don’t have etymological data at our disposal. Perhaps we could conflate more aggressively if we had an expression operator that could search for a character class or regular expression. Then we could check if the local name matches
This is a consequence of OpenStreetMap’s policy of expanding abbreviations in names,1 even when a word would typically be abbreviated in prose. These particular cases will be fixed by onthegomap/planetiler#403. However, there may be features in OSM that inconsistently expand the word in Footnotes
|
Replaced the metadata-based approach to localizing text-field with a top-level expression variable that can be replaced safely each time the language preference changes. This approach can easily be extended to support additional variables that depend on the environment.
When the name in the preferred language matches the name in the local language except for the addition of a prefix or suffix, which is likely to be an insignificant word like “City”, splice the local-language name into the preferred-language name to reduce redundancy.
fd7b02e
to
8025a7d
Compare
If the name of a city in the local language differs from its name in the user-preferred language (#578), the label now includes a gloss on a second line containing the local name in smaller type and in parentheses. The names are compared case-insensitively but without diacritic folding based on the user-preferred language’s collation rules.
The choice of cities, as opposed to other kinds of places, is mainly based on prior art, as seen in #471 (comment). This approach minimizes clutter in countries that use a different writing system than the user does, important because we don’t currently have reliable access to a transliterated name when a translated name is available in a given language. We could tighten up the filter further, so that only cities above a certain
rank
get a gloss, but I think the reason for glossing a label would become less intuitive to users.English
Spanish
Korean
Arabic
Esperanto
This map has reached Peak Internationalization:
🤖
The gloss makes it easy to spot typos, vandalism, and labels needing import cleanup in Wikidata:
Under the hood, this PR replaces the metadata-based approach to localizing
text-field
in #581 with a top-level expression variable that can be replaced safely each time the language preference changes. This is a purer style specification–based solution that’s less likely to force a full style reload when the language preference changes. The code can easily be extended to vary arbitrary layout and paint properties and even filters based on any number of environment variables, not just the language preference. For example, if we label elevations in the future, we could dynamically update them to reflect changes in the preferred measurement system.Fixes #471.