Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate languages for admin names #264

Open
missinglink opened this issue Jun 4, 2019 · 2 comments
Open

Alternate languages for admin names #264

missinglink opened this issue Jun 4, 2019 · 2 comments

Comments

@missinglink
Copy link
Member

It looks like English tokens are not available in the elasticsearch admin indices for locations where the default language is not English.

For instance, the query dionysiou areopagitou, athens will fail with strict admin matching but succeed for the query dionysiou areopagitou, Αθήνα.

It's only by completely ignoring the admin token that the query can succeed.

Note: The language service kicks in afterwards and renders the English token, making it a little confusing to debug.

I think it's important that we start importing more aliases for the terms, which would have to be provided by wof-admin-lookup during PIP.

Some examples:

  • dionysiou areopagitou, athens
  • عمرو إبن العاص, cairo, egypt
@missinglink missinglink changed the title Admin aliases Alternate languages for admin names Jun 4, 2019
@orangejulius
Copy link
Member

That make sense. I was thinking about pelias/schema#349 yesterday, and wondering how much space we would save if we were only indexing, but not storing, admin fields for autocomplete.

Since Elasticsearch has to duplicate all the admin values many times, we aren't getting much "bang for our storage buck", and as we have discovered there is a performance cost to having more data in Elasticsearch: it means fewer records can fit into disk cache.

If we started excluding all the admin fields from the _source field, I suspect our Elasticsearch index size would go down considerably. We are already calling Placeholder on essentially every request anyway. Placeholder is very fast and also can store those admin fields efficiently, so it's a good model.

@orangejulius
Copy link
Member

orangejulius commented Jun 4, 2019

One slight modification to our current behavior that might be useful. Currently, the language middleware in API defaults to English if no language is set via HTTP headers or the lang param. This effectively means we will ask Placeholder for the English name if none is available. Is it possible to calculate a default language value in Placeholder? This would allow us to retain the same behavior as we currently have by going to Elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants