Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-3166: Country code tag based on ISO-3166 #763

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

steliosrammos
Copy link

@steliosrammos steliosrammos commented Sep 5, 2023

Draft NIP for introducing a new country-code tag that can be added to any event. The tag makes use of the ISO-3166 standard country codes and would allow easier filtering and indexing of events based on their geographic location.

It differs from the current g tag presented in NIP-52 in that it is much less granular than a geohash, which is not ideal for indexing.

The same difference is true for the proposed Geospatial Types #136.

Possibly a third parameter could be added (eg: ["g", "", "country"]), in conjunction to the existing g tag, as presented in #230 to support multiple g tag types. But I'm not sure that's preferable for indexing. (removed to focus on country code only, as per @mattn's feedback).

Open to feedback!

@steliosrammos steliosrammos changed the title first draft for NIP-3166 NIP-3166: Geo-location tag based on ISO-3166 Sep 5, 2023
@mattn
Copy link
Member

mattn commented Sep 5, 2023

What do you expect from this country code? Language? Or current location? lat/lng can be specified?

@steliosrammos
Copy link
Author

What do you expect from this country code? Language? Or current location? lat/lng can be specified?

Simple country location. Other NIPs can take care of more granular geo-location.

The direct use-case is for letting profiles show their country of origin, and allow them to be easily searched by it.

@mattn
Copy link
Member

mattn commented Sep 5, 2023

Then, it should be just to be "Country code" not "GEO Location" ?

@steliosrammos
Copy link
Author

Then, it should be just to be "Country code" not "GEO Location" ?

Good feedback, updated the NIP and PR description.

@AsaiToshiya
Copy link
Collaborator

What about the idea to merge with #632?

@steliosrammos steliosrammos changed the title NIP-3166: Geo-location tag based on ISO-3166 NIP-3166: Country code tag based on ISO-3166 Sep 5, 2023
@steliosrammos
Copy link
Author

#632 is more focused on language than location. The two don't always map, and they are different concerns imo.

@arthurfranca
Copy link
Contributor

Maybe this PR should be converted to an "Using Labels to Locate Events" NIP-32 section. Because if we start reserving letters for city, state and so on we may have no more letters left.

How about using NIP-32 like this:

  // ...
  "tags": [
    ["L", "country"]
    ["L", "state"]
    ["L", "city"]
    ["l", "US", "country"],
    ["l", "US-LA", "state"]
    ["l", "US-LA-New Orleans", "city"]
  ]

@staab
Copy link
Member

staab commented Sep 5, 2023

How about using NIP-32 like this:

This was one of the use cases considered in the original NIP 32 conversation, so yeah I would go that direction.

@steliosrammos
Copy link
Author

How about using NIP-32 like this:

This was one of the use cases considered in the original NIP 32 conversation, so yeah I would go that direction.

Just discovering NIP-32, seems to be a good fit at first sight.

My main concern is that it would come at the expense of better indexing, as the l tag index would include all sorts of namespaces, and get worse over time. As far as I can tell, relays are not sub-indexing on the "L" tag right?

That said, covering more location granularities like city, state, region etc. makes sense. Perhaps a similar scheme can be applied here with a "c" label and "C" namespace, dedicated to locations to keep good indexing.

The reason I'm suggesting that is because a location tag might be generic/popular enough to justify its own tag.

@staab
Copy link
Member

staab commented Sep 5, 2023

L tags are single-letter, so based on the spec they would be indexed. In practice, they probably aren't, but if more people use them they will be.

@steliosrammos
Copy link
Author

L tags are single-letter, so based on the spec they would be indexed. In practice, they probably aren't, but if more people use them they will be.

Although they would be indexed, wouldn't it be less efficient than having an index dedicated to location? I could be misunderstanding how relays index values.

@staab
Copy link
Member

staab commented Sep 5, 2023

It depends, but it probably would have an impact. You could also do something like [["L", "country"], ["l", "country:GB"]] and only filter based on the qualified l tag

@steliosrammos
Copy link
Author

What's the best way forward here? Open a new PR that extends NIP-32 with a new section detailing the location use case?

@vitorpamplona
Copy link
Collaborator

The best way forward is almost always implementing it inside your preferred use case to see if it is actually better.

I am not sure if any of you coded this, but I learned a lot when implementing the geohash tag. And I am not saying that that is better than this. It's just always good to test things out.

@Semisol
Copy link
Collaborator

Semisol commented Sep 5, 2023

I think we should have a use case for this and/or an implementation before making this an official NIP. Otherwise, good idea.

@Semisol
Copy link
Collaborator

Semisol commented Sep 5, 2023

It depends, but it probably would have an impact. You could also do something like [["L", "country"], ["l", "country:GB"]] and only filter based on the qualified l tag

Anything a-zA-Z must be indexed

@steliosrammos
Copy link
Author

For sure, not rushing for this to be merged, just getting feedback on the idea. The use case is displaying country/region on profiles such that they can be browsed by country/region. Client implementation will follow, still a few weeks away.

@steliosrammos
Copy link
Author

Anything a-zA-Z must be indexed

At this point I'm starting to think that the indexing strategy needs to be discussed separately. But my take-away thus far is:

  1. Should a location tag be indexable? -> yes
  2. Is the current NIP-32 l tag good for this use case? -> pretty much, yes
  3. Will the l tag indexes start to grow too large because they are generic, impacting the performance of query filtering on location? -> probably yes, but too soon to worry about (?)

@dskvr
Copy link
Contributor

dskvr commented Sep 6, 2023

Then, it should be just to be "Country code" not "GEO Location" ?

@mattn Why limit geocoordinates to just the country code? Why not allow developers to discern the precision of a geotag as opposed to creating a limitation from the start?

lat/lng can be specified?

geohash is much more reliable than lat/lon since lat/lon has varying precision.

From #230.

    ["g","ww8p1r4t8","geohash"],
    ["g","Amsterdam","city"],
    ["g","NL","countryCode"],
    ["g","EU","continent"],
    ["g","Earth","planet"],

As implemented above, index-case has no conflicts and the read-case doesn't require the client to discern what the datapoint represents, since it is expressed in tag[2]

@fiatjaf
Copy link
Member

fiatjaf commented Sep 6, 2023

I agree with the sandwich.

@arthurfranca
Copy link
Contributor

arthurfranca commented Sep 6, 2023

@dskvr Good idea. But imo the tag[1] should be enough to filter out (relay side) most of the things you don't want. The values are different enough from each other that it probably works withough tag[2] (except for continent code clashing with country code or state) . Also geohashes may have many entries to allow for broader filtering maybe down to 4 digits (city-level) up to 9 (building level) depending on what your app needs.

So I would turn your example into this:

  ["g","ww8p1r"]
  ["g","ww8p1"],
  ["g","ww8p"], // city precision
  ["g","ww8"], // small state precision
  ["g","ww"], // big state precision
  ["g","US-LA-New Orleans"], // city; or LA-New Orleans
  ["g","US-LA"], // state; or just LA because country used the 3-letter code
  ["g","US"], // country. or USA
  ["g","continent>NA"], // unusual
  ["g","Earth"] // unusual

edit: to cover city right (by using smaller pieces) when searching it probably needs higher precision than 4 digits

@arthurfranca
Copy link
Contributor

@fernandolguevara why do you disagree? xD

If you search for Republic of Namibia country (NA), relay would send you all North America (NA continent) entries and probably some other country entries that has NA state =P.

@fiatjaf
Copy link
Member

fiatjaf commented Sep 6, 2023

I just realized https://github.com/nostr-protocol/nips/blob/master/32.md could already be used for this.

@fernandolguevara
Copy link
Contributor

fernandolguevara commented Sep 6, 2023

@arthurfranca imo relays should use the component descriptor geohash | city | state | countryCode | continent | planet on the g tag to match entries

@arthurfranca
Copy link
Contributor

@fernandolguevara oh ok but right now relays can't use the tag array's third item (second value) on searches. The third item would be just something you could use to filter client-side, after receiving relay response.

@dskvr
Copy link
Contributor

dskvr commented Sep 7, 2023

@fiatjaf That is true, may not need a new tag, but it would still be beneficial for geodata to be consistent.

@arthurfranca The third key doesn't need to be filtered against. If you are filtering against AN (Antarctica) you don't need to specify you are filtering a continent. There are no clashes between Continent Codes and Country Codes. Was wrong, you mentioned Republic of Nambia and North America collision. To resolve this the Alpha‑3 code of ISO-3166 could be used instead of the Alpha-2 code, which could resolve stateRegion conflicts as well.

@arthurfranca
Copy link
Contributor

@dskvr Ok, so if keeping the third item to account client-side for an unexpected clash you would want something like this:

  ["g","New Orleans", "city"],
  ["g","LA", "stateCode"],
  ["g","USA", "countryCode"],
  ["g","North America", "continent"],
  ["g","Earth", "planet"]

Turned continent code into the full name to not clash with any state code.
But if searching by city, instead of filtering like { #g: ["New Orleans"] } it would have to be { #g: ["USA", "LA", "New Orleans"] } because there exists equal city names in different states of the same or not country. And there are equal state codes at different countries.

BUT from NIP-01 At least one of the arrays' values must match the relevant field in an event for the condition to be considered a match. so this filter { #g: ["USA", "LA", "New Orleans"] } is an OR clause which wouldn't work.

So the correct would probably be something like this:

  ["g","US-LA-New Orleans", "city"],
  ["g","US-LA", "stateCode"],
  ["g","US", "countryCode"],
  ["g","North America", "continent"],
  ["g","Earth", "planet"]

Or using three letter code for countries and codes for continents (though I prefer the previous one):

  ["g","USA-LA-New Orleans", "city"],
  ["g","USA-LA", "stateCode"],
  ["g","USA", "countryCode"],
  ["g","NA", "continentCode"],
  ["g","Earth", "planet"]

It ended up not being that simple, so @steliosrammos now indeed a NIP for g tag may be useful for interoperability.

Now the city filter works as expected: { #g: ["US-LA-New Orleans"] }

@vitorpamplona
Copy link
Collaborator

You might want to consider using colon (i.e. USA:LA:New Orleans) since there are city names with a dash like Port-au-Prince, Haiti.

@dskvr
Copy link
Contributor

dskvr commented Sep 7, 2023

@arthurfranca I definitely overlooked the or limitation of filters.

US-LA-New Orleans diverges ISO-3166-2 subdivisions.

If using subdivisions, might as well represent the key correctly.

["g","US-LA", "ISO-1366-2"],
["g","US", "ISO-1366-1:Alpha-2"],
["g","USA", "ISO-1366-1:Alpha-3"],
["g","840", "ISO-1366-1:Numeric"],

But there is still the issue with cities, which breaks out of ISO-1366, so some alternatives...

You might want to consider using colon (i.e. USA:LA:New Orleans) since there are city names with a dash like Port-au-Prince, Haiti.

This would be better USA-LA:New Orleans because USA-LA is a ISO-3166-2 subdivision whereas USA:LA:New Orleans would kind of require a brand new specification.

@arthurfranca
Copy link
Contributor

arthurfranca commented Sep 7, 2023

@vitorpamplona Yeah HT-OU-Port-au-Prince would be a problem when splitting =o.

@dskvr Regarding Cities:
unlocode: According to a comment on stackoverflow, the DB is incomplete
iata: depends on the city having airport, so incomplete too
geonames: good one with integer codes, free and widely used. (negative is people can input fake data and it depends on the DB existence)

Geonames though can also be used for country and state
HT (3723988) - https://www.geonames.org/3723988/haiti.html
OU (3719432) - https://www.geonames.org/3719432/departement-de-l-ouest.html
Port-au-Prince (3718426) - https://www.geonames.org/3718426/port-au-prince.html

If we use geonames just for city and other smaller subdivisions it becomes this:

["g","4335045", "GeoNames"], // city
["g","US-LA", "ISO-1366-2"],
["g","US", "ISO-1366-1:Alpha-2"],
["g","North America", "continent"],
["g","Earth", "planet"]

So, USA-LA:New Orleans or 4335045 or both?

@s3x-jay
Copy link

s3x-jay commented Sep 21, 2023

I would encourage you to use NIP-32. While you may want ISO-3166, others may want some other coding system. NIP-32 handles all of them. But to clarify some of the questions about it…

To put things simply (to make sure everyone is on the same page), only the first parameter after a single letter tag is indexed. AND there's (currently) no support for starts with queries in Nostr. That means both l and L are needed so you can query on all values matching a particular string OR all the uses of a particular namespace / coding system.

When we were coming up with NIP-32 there was some argument about how values should be structured. I was mostly thinking in terms of structured vocabularies (like ISO-3166) and wanted the namespace / vocabulary / coding system as a prefix on the value. Others wanted to query for values across multiple coding systems and wanted the value to have no prefix.

As a result, most of the examples in the NIP follow the no-prefix example like this…

["L", "GeoNames"],
["L", "ISO-3166-2"],
["l", "3173435", "GeoNames", "{\"confidence\":1,\"quality\":1}"],
["l", "IT-MI", "ISO-3166-2", "{\"confidence\":1,\"quality\":1}"],

But to use that data you have to query on both L and l and you may get results that aren't want you're looking for if another namespace is used in the event that has the value you're looking for.

If you notice the last paragraph of the NIP mentions…

Vocabularies MAY choose to fully qualify all labels within a namespace (for example, ["l", "com.example.vocabulary:my-label"]. This may be preferred when defining more formal vocabularies that should not be confused with another namespace when querying without an L tag.

Using that approach you can query just on l and get a clean set of results returned to you. NIP-32 does not state what delimiter should be used between the namespace and the value. Personally I always favored using '>' since it has few conflicts and implies a parent-child relationship. [You should query on L to see what others are using when working with a defined coding system like ISO-whatever…] If you take this approach your tags would look like this…

["L", "GeoNames"],
["L", "ISO-3166-2"],
["l", "GeoNames>3173435", "GeoNames", "{\"confidence\":1,\"quality\":1}"],
["l", "ISO-3166-2>IT-MI", "ISO-3166-2", "{\"confidence\":1,\"quality\":1}"],

Let me know if any of that raises questions…

@dskvr
Copy link
Contributor

dskvr commented Dec 13, 2023

@s3x-jay

Let me know if any of that raises questions…

Not really any questions, but NIP-32 solution doesn't work well with filters. Since NIP-32 is another event kind and filters do not presently (and will probably never) support joins and the association is created separate from the event that needs to be labeled, it cannot be filtered on easily.

missed this bit:

"Publishers can self-label by adding l tags to their own non-1985 events..."

@dskvr
Copy link
Contributor

dskvr commented Dec 15, 2023

I had to unblock myself on NIP-66 so I made this https://github.com/sandwichfarm/nostr-geotags, might be useful for someone here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.