-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethinking identifiers #3995
Comments
When it comes to the name, I think it would be handy to have a way to mark this up as being "unique", after the brand has been added! For entries like JD Wetherspoon (#3009) and Harvester (#4034) where the brand is strong, but the pub / restaurant itself is often uniquely named, rather than removing JD Wetherspoon or Harvester, keep them as they are, because they are strong brands that I feel people will type in first, but allow the NSI to flag to iD that the name is likely not "JD Wetherspoon" and has its own name, and this should be checked by the mapper. I personally think it's better for a pub to be incorrectly named with the brand name as it's still how many people might refer to the pub, rather than have the brand not be suggested when someone searches, as they then may just add the brand them themselves, but may add it wrong, if that makes sense. Meaning that even if the name is wrong, because it's named after the brand, at least that incorrect naming would be consistent, and so easier to find and correct, such as with a flag on the NSI guide web site that sees a branded entry with "JD Wetherspoon" as the name, and can flag it, rather than someone adding the name "Spoons", for example, and having that missed. |
FYI, there are quite a few Wikidata items now linking to NSI using the NSI identifier property. But the items can be bulk-updated (along with the property constraints) based on whatever decision we arrive at here. |
(re: #3995) - currently using the format `simplename-hash` (e.g. "starbucks-f83d44") - where hash is MD5 fragment of `${tree} ${key} ${value} ${locationID}` This should generate a reasonable identifier that stays stable until one of those changes. Also we can eliminate disambiguators as long as same-named brands differ in one of these.
Quick update on some identifiers work that I did last Friday:
{
"brands/shop/craft": [
{
"displayName": "A.C. Moore",
"id": "acmoore-286374",
"locationSet": {"include": ["us"]},
"oldid": "shop/craft|A.C. Moore",
"tags": {
"brand": "A.C. Moore",
"brand:wikidata": "Q4647066",
"brand:wikipedia": "en:A.C. Moore",
"name": "A.C. Moore",
"shop": "craft"
}
},
{
"displayName": "Hobby Lobby",
"id": "hobbylobby-e90acf",
"locationSet": {"include": ["in", "us"]},
"oldid": "shop/craft|Hobby Lobby",
"tags": {
"brand": "Hobby Lobby",
"brand:wikidata": "Q5874938",
"brand:wikipedia": "en:Hobby Lobby",
"name": "Hobby Lobby",
"shop": "craft"
}
},
{
"displayName": "Hobbycraft",
"id": "hobbycraft-ed2283",
"locationSet": {"include": ["gb"]},
"matchTags": ["shop/art"],
"oldid": "shop/craft|Hobbycraft",
"tags": {
"brand": "Hobbycraft",
"brand:wikidata": "Q16984508",
"brand:wikipedia": "en:Hobbycraft",
"name": "Hobbycraft",
"shop": "craft"
}
},
... |
I'm almost finished with this work. 🎉 There will be a bunch of conflicts to resolve, so I've disabled merging to |
OK - the new files are merged in..
|
to avoid unicode or right-to-left surprises. Also split out the id generation into its own function. (re: #3995) This now exposed a bunch of brands that were duplicate in the index so the duplicates have been removed. (e.g. an item in English and an item in Thai for the same thing, but both with same `name:en` and same locationSet) The duplicate removal just happened to close #4106 also.
The new code seems to be working pretty ok! This unblocks a bunch of other things that I'll tackle soon. I'd like to leave this open until all those P8253 Name Suggestion Identifier properties on Wikidata have been updated. I'd ideally like to make this an automatic thing that the |
So not bulk uploading it now, to not prevent the testing of such an automation? |
I don't understand your question, sorry.. Mostly I just haven't bulk updated the P8253's yet because I ran out of time yesterday to implement it. |
I meant, that if I, or anybody else, would upload it in bulk now, it might interfere with your idea to update them automatically in the |
I don't think it would interfere - it's a one way update from NSI -> Wikidata. If you update them all now, the script would just have less to do later. |
I updated the |
Great work @bhousel 👍 Can the new updates integrate into iD easily? Can a new release of the NSI be pushed to it anytime soon? |
I don't know, sorry. I was removed from the iD project and no longer maintain it. |
Probably the best thing to do is just open an issue on the iD page and see what's the answer |
I'd like to make some changes in the index to better support more kinds of named POIs and more granular locations. This is just a brain dump of some thoughts on how we track brands and the limitations with our current approach.
a long time ago
When this project started many years ago, each entry in NSI was just a unique name that we picked out of the OSM planet file.
For example, here's what NSI looked like in 2017:
name-suggestion-index/name-suggestions.json
Lines 8621 to 8623 in 50993d4
This says that Target is an entry that with the name tag set to
Target
, and it has been used about 1000 times and sits under theshop/department_store
hierarchy. iD would turn this into a preset that assigns the tagsname=Target
andshop=department_store
.Limitations of this:
brand:wikidata
countryCodes
or any concept of where the brand was validSo we did a lot of work on the NSI in 2018-2019 to arrive at the current format.
I did a talk about it!
today
Each entry in NSI represents a unique "brand" identified by an string like:
key/value|name~(disambiguator)
The
disambiguator
part is optional and used in situations where distinct entities use the same literal name.name-suggestion-index/brands/shop/department_store.json
Lines 730 to 758 in 50278e5
We solved the limitations from long ago, and NSI has really grown!
brand:wikidata
and fetch a wealth of related data from the Wikidata project (like logos)countryCodes
for a while, nowlocationSet
which is even more flexibleBut now we have a new set of limitations:
brand
andbrand:wikidata
as the "key" even though other feature types might better be keyed off ofoperator
ornetwork
name
tag on everything, which causes issues for some brands (see Wetherspoons Pub Add Wetherspoons Pub #3009, but also this has caused issues with things like hotels and auto dealerships)So I'd like to think through how to rework the NSI entries to solve these limitations. More to come later...
The text was updated successfully, but these errors were encountered: