Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - allow non-ASCII characters in geography #5417

Closed
dustymc opened this issue Dec 21, 2022 · 6 comments
Closed

Feature Request - allow non-ASCII characters in geography #5417

dustymc opened this issue Dec 21, 2022 · 6 comments
Labels
Enhancement I think this would make Arctos even awesomer!

Comments

@dustymc
Copy link
Contributor

dustymc commented Dec 21, 2022

Is your feature request related to a problem? Please describe.

Geography has traditionally disallowed non-ASCII characters for various reason, but now

  1. PG extension unaccent exists - we can allow searching without matching "special" characters, and
  2. The Plan mostly means we don't need to worry about duplicates since we now have global standardization.

Describe what you're trying to accomplish

More closely "do what GADM does," which is increasingly important for creating new geography.

Describe the solution you'd like

  1. Update documentation
  2. Install unaccent and rebuild geog-things to use it
  3. Allow whatever characters GADM uses in geography

Describe alternatives you've considered

  1. https://github.com/ArctosDB/internal/issues/222 - given sufficient resources, I could preemptively create everything, which would just avoid much of the technical problem.
  2. We can continue without doing anything, but it's more room for mistakes especially when GADM contains non-ASCII characters in placenames (and perhaps not very respectful of the people who live in those places).

Additional context

I think unaccent would allow eg record bulkloading to continue to work as it does now, but there's some (small, I think) chance that this would somehow complicate SOMETHING for a few areas of the world.

Example

https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=10007445

I'd like to change Eastern Province to Ash-Sharqīyah, which contains 'LATIN SMALL LETTER I WITH MACRON' (U+012B).

Priority

Relatively low, but some geography creation requests could drastically change my outlook on this.

Does anyone have any reason not to do this?

@dustymc dustymc added the Enhancement I think this would make Arctos even awesomer! label Dec 21, 2022
@dustymc dustymc added this to the Needs Discussion milestone Dec 21, 2022
@szaborac
Copy link

Some of the specimens I handle in Arctos have special characters in the locality (I often use Cyrillic and Norse characters, for instance). Allowing non-ascii characters would be very helpful, especially in instances where the location in the English transliteration isn't as precise or clearly understood as the local alphabet would be.

@dustymc
Copy link
Contributor Author

dustymc commented Jan 5, 2023

@szaborac is that geography or locality?

Locality has accepted Unicode characters for some time, but also see ArctosDB/documentation-wiki#291 - one (primary) purpose of spec locality is to provide data that machines can understand, and machines mostly have English biases. (But I just fed GeoLocate - the machine we use most for this - 'Магадан' and it did what it should have, so maybe this isn't a concern? Still needs understood and documented.)

isn't as precise

This is why I suspect you're not talking about geography: All Arctos geography is spatial, there can be no ambiguity. (There are also strings and those are great at confusing people, but the shape is perfectly precise and what really defines geography.)

@szaborac
Copy link

szaborac commented Jan 5, 2023 via email

@dustymc
Copy link
Contributor Author

dustymc commented Jan 16, 2023

From #5486

Do GADM things not have identifiers?

Not that I can see in my data. I do try to do something with eg

Screenshot 2023-01-16 at 10 35 10 AM

and I can maybe-probably dig that out of remarks and match it up to gadm's data, but I don't expect anyone else to.

https://gadm.org/maps/ATF/ileseparses.html

Those aren't even always the same data that I can access, but that's a really nice identifier and I'd be happy to use it if you can convince them to!

@Jegelewicz
Copy link
Member

Do GADM things not have identifiers?

I asked this in the contact form - let's see what we get back.

@dustymc
Copy link
Contributor Author

dustymc commented Feb 2, 2023

AWG issues discussed: stick with ASCII

TODO: rebuild deasciiifier to use this, maybe do something special with unaltered search term?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement I think this would make Arctos even awesomer!
Projects
None yet
Development

No branches or pull requests

3 participants