Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-XXX Internationalization and Localization #1127

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eznix86
Copy link

@eznix86 eznix86 commented Mar 18, 2024

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 18, 2024

Two feedbacks:

  1. Not using a single letter for the language tag means that Clients cannot filter by language. They will have to download everything and then filter by the preferred language locally.
  2. Profile fields:
    2.1. language should be an array to support bilinguals.
    2.2. In case of translation services available, two other fields are important: translateTo: en, dontTranslateFrom: [en, pt]. I suppose the allowed_languages tag could be close to dontTranslateFrom, but one can allow the download of languages they don't speak and let them run into the translator to the languages they do speak. So, the group of allowed_languages is larger than the dontTranslateFrom.

For instance, my setup is:

  • language: ['en', 'pt']
  • allowed_language: null - Everything.
  • dontTranslateFrom: ['en', 'pt', 'es']
  • translateTo: 'en'

@eznix86
Copy link
Author

eznix86 commented Mar 18, 2024

Two feedbacks:

1. Not using a single letter for the `language` tag means that Clients cannot filter by language. They will have to download everything and then filter by the preferred language locally.

2. Profile fields:
   2.1. `language` should be an array to support bilinguals.
   2.2. In case of translation services available, two other fields are important: `translateTo`: `en`, `dontTranslateFrom`: [`en`, `pt`]. I suppose the `allowed_languages` tag could be close to `dontTranslateFrom`, but one can allow the download of languages they don't speak and let them run into the translator to the languages they do speak. So, the group of `allowed_languages` is larger than the `dontTranslateFrom`.

For instance, my setup is:

* `language: ['en', 'pt']`

* `allowed_language: null` - Everything.

* `dontTranslateFrom: ['en', 'pt', 'es']`

* `translateTo: 'en'`

That's a great feedback! Thanks.

Here some possible additions:

  • allowed_languages: can be null or not specified on a kind: 0.
  • Using the allowed_languages: Example allowed_languages: ['es', 'fr'], es becomes your default language, so whatever is first is your primary language. If it is null or not specified, it is default to Everything, and every posts which comes to you in another language are never translated.

Additionally I think languageon a note (NIP-01) should be one value Example: fr, because you tag that single note to be "translatable" for example, a note cannot be multilingual.

But for kind: 0 i think we can ofcourse have languages: ['pt', 'fr'].
As for dontTranslateFrom and translateTo. yes allowed_languages is just enough.

Let see what others can chip in!

@tyiu
Copy link
Contributor

tyiu commented Mar 18, 2024

This seems similar to this language tag proposal from @alexgleason last year but that PR looks stalled:
#632

@eznix86
Copy link
Author

eznix86 commented Mar 18, 2024

This seems similar to this language tag proposal from @alexgleason last year but that PR looks stalled: #632

I agree, it has l as tag ! Here we are also adding proposal for kind-0 too

@tyiu
Copy link
Contributor

tyiu commented Mar 19, 2024

I love the idea of incorporating more internationalization and localization concepts as a first-class citizen. Not everyone speaks the same language.

Having an l tag is great.

I do not think translation settings should live in the kind 0 profile. It's starting to bleed client implementation detail unnecessarily, and sets up the expectation that all clients should perform translations by default. I think those settings can live local to the client itself. Most modern browsers and operating systems have preferred language settings built in, but the client can either respect those settings and/or have local settings to override or augment the OS settings if it wants.

What I think would be useful is to communicate to other people which languages they speak on their profile so that it signals to others if they should subscribe to their content. Which is basically what you already proposed and what Vitor mentioned with supporting multiple languages in an array.

["languages", "en", "uk"]

@eznix86
Copy link
Author

eznix86 commented Mar 19, 2024

100% agree!

So to summarize all of us agree that we should a kind-0 languages is an array and kind-1: l or language single value.

@vitorpamplona
Copy link
Collaborator

Isn't l used for labels already?

@staab
Copy link
Member

staab commented Mar 19, 2024

Isn't l used for labels already?

Yes. Which leads to the question: why not define a label for language? It would work exactly like we're discussing, but with an additional L tag.

@vitorpamplona
Copy link
Collaborator

This doesn't seem that bad:

["L", "iso639-1"]
["l", "en", "iso639-1"]

@eznix86
Copy link
Author

eznix86 commented Mar 19, 2024

This doesn't seem that bad:


["L", "iso639-1"]

["l", "en", "iso639-1"]

Do you think its important to include the standard in the array ?

@vitorpamplona
Copy link
Collaborator

Do you think its important to include the standard in the array ?

It's required per NIP-32

@staab
Copy link
Member

staab commented Mar 19, 2024

I just drafted a PR to relax the requirement for L tags. They're only necessary if you want to query something for namespace use, which doesn't really make sense in this example.

#1129

@eznix86
Copy link
Author

eznix86 commented Mar 19, 2024

Do you think its important to include the standard in the array ?

It's required per NIP-32

So for every note we write we add a label ?

What about a kind-0?

@vitorpamplona
Copy link
Collaborator

So for every note we write we add a label ?

Just the tags, not the label event.

@eznix86
Copy link
Author

eznix86 commented Mar 20, 2024

So I think we agree how to implement Intl and Localization. I will update the NIP to propose what we discussed. What do you think ?

@tyiu
Copy link
Contributor

tyiu commented Mar 21, 2024

So I think we agree how to implement Intl and Localization. I will update the NIP to propose what we discussed. What do you think ?

Sounds reasonable to me. It seems we have consensus amongst the people who were involved in the discussion.

@eznix86
Copy link
Author

eznix86 commented Mar 21, 2024

I will do the change but before to summarize:

  • It is a second class citizen to a note by respecting NIP-32:
    • We define a ["l", "en-US"] only to label a note with kind 1985 event.
    • A relay can take this label as truthful or may ignore it. (Example: A relay using https://fasttext.cc/docs/en/crawl-vectors.html, automatically knows the language).
  • kind 0 may contain a first class citizen languages: ["uk", "fr"] (using two letter ISO 639-1 language code) as metadata;
    • Possible uses:
      • Searching/Showing notes based on their locale.
      • If not specified, it will have a default behaviour, without filters.
      • Translation capabilties.
      • Software can adapt.

Hope we are good!

@tyiu
Copy link
Contributor

tyiu commented Mar 21, 2024

I will do the change but before to summarize:

* It is a second class citizen to a note by respecting [NIP-32](https://github.com/nostr-protocol/nips/blob/master/32.md):

I don't think it should be a second-class citizen. I think it should be first-class. See my comments below.

  * We define a `["l", "en-US"]` **only** to label a note with `kind 1985` event.

I think we can just add the l tag onto the note itself. NIP-32 allows for it, per the Self-Reporting section. This way, it makes it really easy to filter for the languages that the client wants in the relay subscription. Others may choose to label the note with a separate kind 1985 as well if they wish, which works well if the original note didn't self-report the language.

Everything else seems good.

@eznix86
Copy link
Author

eznix86 commented Mar 21, 2024

Thanks for clarifying. Will do the update !

@erskingardner
Copy link
Contributor

I like this idea in theory, but trying to think about this from a client dev POV. Clients are the ones that are adding this "l" tag to events, but they don't necessarily have a good way to be sure what language a note is written in. Assuming the language based on device or keyboard settings is not always accurate. To be clear, I don't know that there is anything we need to do or change here, I just wanted to bring this up as a hindrance to adoption of this by clients.

It would be good to see a more complete example of the fields that might be added to the kind:0 event.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Mar 28, 2024

Assuming the language based on device or keyboard settings is not always accurate.

Yep. Most people don't change the keyboard language to match their spoken language. And bi-linguals (generally: local language + English) don't switch the keyboard when writing in each language. Since bi-linguals tend to be more prominent people, the posts of more prominent people might have wrong language tags.

Also, it's not uncommon to see posts in two languages: one paragraph in each. Amethyst's language identification and translation runs for each paragraph independently. We do a similar process to identify right-to-left paragraphs.

A solution would be to list all possible languages in each post and let the receiving client filter out when the post is not in the tagged language. Basically, allowing large amounts of false-positives from the signing side to minimize false negatives.

Meaning: if the goal is to query, having errors in the return is more acceptable than failing to return a post because the signer was too conservative in the language tag choices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants