Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language Codes conflicting #104

Closed
pedro-mendonca opened this issue Mar 26, 2015 · 5 comments
Closed

Language Codes conflicting #104

pedro-mendonca opened this issue Mar 26, 2015 · 5 comments

Comments

@pedro-mendonca
Copy link
Contributor

Sinced we're having both Portuguese and Portuguese (from Brazil) languages that I've had a difficulty to understand the Language Codes.

The two letter code for Language Code is far from good.
There are several possibilities and a more wide and WordPress compatible should be chosen.

In a simple example, in the pre-installed list there are:
pt - Portuguese
br - Breton
pt-br(?) - Portuguese (from Brazil)

I'm sure there are a lot of languages that derive from the same base, as en_US/UK or es_ES/MX/AR languages.
To correctly identify these, the two letter code is not enough.

There isn't for now a Breton localization team for WordPress, but the problem still exists for several other languages. Look at the current list of available WordPress languages here:
http://wpcentral.io/internationalization/
and here
https://make.wordpress.org/polyglots/teams/

I've been searching for the optimal code system to solve this issue, there are some possible choices.
As this is a WordPress plugin the best option would be using the same system that it's parent framework.
I suggest to look a bit inside this list I've found in the core of GlotPress, it's the most complete info on all the languages with all the necessary codes to correctly identify the languages, the ones that already have translation and the ones that might have in the future.
https://github.com/GlotPress/GlotPress/blob/master/locales/locales.php

Below I've pasted the info of the three languages I found conflicting to make it easy to choose the apropriate precise system that won't generate any conflict.

$pt->english_name = 'Portuguese (Portugal)';
$pt->lang_code_iso_639_1 = 'pt';
$pt->country_code = 'pt';
$pt->wp_locale = 'pt_PT';
$pt->slug = 'pt';
$pt->google_code = 'pt-PT';
$pt->facebook_locale = 'pt_PT';

$pt_br->english_name = 'Portuguese (Brazil)';
$pt_br->lang_code_iso_639_1 = 'pt';
$pt_br->lang_code_iso_639_2 = 'por';
$pt_br->country_code = 'br';
$pt_br->wp_locale = 'pt_BR';
$pt_br->slug = 'pt-br';
$pt_br->google_code = 'pt-PT';
$pt_br->facebook_locale = 'pt_BR';

$br->english_name = 'Breton';
$br->lang_code_iso_639_1 = 'br';
$br->lang_code_iso_639_2 = 'bre';
$br->country_code = 'fr';
$br->slug = 'br';

Using country code is incorrect as a lot of languages belong to the same country.
Using the two letter ISO_639_1 is a incomplete and generates conflict.
I find most convenient one to be the slug = (aka Locale Code) and wp_locale = (aka WordPress Locale) chosen by WordPress, it perfecly identifies the language with no conflict.

In the end, this is similar to the system that you've implemented in qTranslate-X, but the list that you point in the form (http://www.w3.org/WAI/ER/IG/ert/iso639.htm#2letter) for users to choose their Language Code is far from good as it's incomplete to avoid conflict and it's not possible for to user to write more that two letters in the field.

I suggest to allow more letters in the form field and to use the WordPress corresponding adopted fields in http://wpcentral.io/internationalization/:
[QTX] Language Code -> Locale Code
[QTX] Locale -> WordPress Locale

@johnclause
Copy link
Member

Reading this made me realized that 'pt-br' will not currently work correctly in q-X.
Many sites use two-letter code for languages and survive, because they never use too many languages. Yes, two-letter is not enough, but that would be too big change right now. You can use 'br' instead of 'pt-br' if you wish to have them simultaneously. We need to change the default code to be two letter code.

I guess, the current strategy:

[QTX] Language Code = two-letter of admin choice, which defines '[QTX] Locale'.
[QTX] Locale -> WordPress Locale, Browser '' attribute.

and that should work, if we do not use all languages in the world at the same time. In most of the times, it works just fine. At the end, the users do not care how it works, as long as site shows everything in their language and keeps this language during browsing. Simple users do not even notice two-letter code, a URL is just one cryptic string for them, which they know how to paste into browser or email, that is all what counts.

We will probably have to live with this for quite a while longer. There are other more important issues to resolve. The number of unresolved issues in support thread growing, slowly, but still growing. People still find bugs. The list of Known Issues is impressive. These makes this topic not to be on the top, although we will eventually need to attend it somehow. What about WPML? I had impression they also use two-letter codes? Maybe it is just fine for now and a few more years ahead?

@johnclause
Copy link
Member

Pedro, do you mind if we change 'pt-br' to 'pb', for example. It needs to be two-letter code which does not match anything other two-letter codes.

@pedro-mendonca
Copy link
Contributor Author

I don't know how WPML works.
I think that despite it's great to have a lot of translations like pt_PT and pt_BR, as english and spanish variations, the probability of a site having similar languages activated is very small.
Probably the owner will choose only it's native language and will use it's main simple slug like pt/en/es, etc.

So, having pt-br as just pb is fine just to differenciate, probably an admin from a brazilian site will use a setting with code pt and locale pt_BR, redefining it's settings according to that.
So, in the end it's not really important.
I was having difficulty to test and change the pt-br I've added to diferenciate from pt and that is why I've opened this issue.
Once this situation is identified and again, as I believe that probably no one will have both similar versions of a language in the same site, I consider that the two letter code is enough, and that there is no need to keep a direct, unique and precise relation between the chosen slug and language files.
I think this can be closed now.

@johnclause
Copy link
Member

I will have to explain this in startup guide, which q never had yet, but it is coming ...

Indeed let us close it for now, we will open another issue when we are ready to make major redesign of this part.

@felipesetlik
Copy link

A message from Google Search Console: 'pb' - unknown language code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants