Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Lao localization #15021

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

janbrasna
Copy link
Contributor

One-line summary

Adds lotg to config to stop 404ing in prod.

Significant changes and points to review

Why are these being offered in prod locale selection (at / root) at all when not in the config is another issue, but this at least fixes the broken links. One would still default to en-US due to low %% of translated content, but the other should at least offer home-old with pretty decent amount of the most important stuff incl. about, manifesto etc. already translated.

Issue / Bugzilla link

#15010

Testing

(in prod mode, with Debug=False & Dev=False)
http://localhost:8000/ (with Accept-Locale unset)
http://localhost:8000/locales/
http://localhost:8000/lo/
http://localhost:8000/tg/

Copy link

codecov bot commented Aug 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.70%. Comparing base (83f043a) to head (c9bbf90).
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15021      +/-   ##
==========================================
+ Coverage   77.64%   77.70%   +0.06%     
==========================================
  Files         162      161       -1     
  Lines        8341     8380      +39     
==========================================
+ Hits         6476     6512      +36     
- Misses       1865     1868       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexgibson
Copy link
Member

I'd say rather than simply enable these we should instead look at why these locales are listed incorrectly. If they are not translated correctly then we may do more harm than good by enabling them here.

@alexgibson
Copy link
Member

Added a do-not-merge label for now. It might be fine but we should double check things first, thanks for flagging though!

@janbrasna
Copy link
Contributor Author

Tajik is at 20% strings (https://pontoon.mozilla.org/tg/mozillaorg/) no different than say Afrikaans or Acholi and will 302 to English anyways all the time in prod not meeting the threshold, not showing up in pickers, metas and sitemaps. (Well, besides the / root and its hreflang now, where it shouldn't be in the first place #15010)

  • However! While Afrikaans doesn't get listed in / language selection (i.e. for bots), Acholi does, as well as Tajik now (which isn't even enabled in the locales now to start with), but is definitely not active for home, so this needs improving in Only offer active homepage locales on / site root #15010 nonetheless.
  • This only changes 404 to 302 for now on prod / root, no other change elsewhere.

Lao at 80% has (https://pontoon.mozilla.org/lo/mozillaorg/) rather complete translation from some time ago (i.e. just not including the more current home-new, cookie-consent etc.) that would otherwise be active, looking pretty good: https://www-dev.allizom.org/lo/?xv=legacy (not that I'd be able to verify it's not some bogus content, but according to G translation it makes sense in all the key areas I've checked…)

  • This not only fixes the current 404, but also enables the locale in places that check for it properly, so this will recognise it as active:
  • with alternate hreflangs metas elsewhere, exposed to people & bots in pickers and sitemaps etc.

(Not sure how it is different from any other language that's not intelligible by the team members and only has a handful of contributors so lack a broader exposure or consensus across the language community… I only see @peiying2 as the PM on both, so not sure who else might be able to confirm the content is correct and not harmful to publish. Tajik won't activate on prod, Lao would… and given its coverage it probably deserves it.)

Nonetheless these two languages are being consumed via www-l10n and being active_locales for some fluent files anyways, so this only enables the l10n machinery to acknowledge these codes are a valid locale path for the render, with them being either shown or not handled by the usual logic. As a bonus this adds them to the language picker for dev/demos for visibility (where it is otherwise active by default anyways, as DEV just blindly reads in all the ftl files from data folders, not restricting them to active_locales or PROD_LANGUAGES).

I'm investigating in the meantime how a locale otherwise not making the cut for activation percentage or completely absent from translations allowed in settings makes it to the languages context in #15010 as we speak. (Found the culprit, will write it up, but will need @stevejalim to untangle the magic anyways;)…)

@alexgibson
Copy link
Member

I'm investigating in the meantime how a locale otherwise not making the cut for activation percentage or completely absent from translations allowed in settings makes it to the languages context in #15010 as we speak. (Found the culprit, will write it up, but will need @stevejalim to untangle the magic anyways;)…)

Yeah, I think understanding this is probably my main question. If a locale does not have adequate coverage, then it probably shouldn't be showing up here.

@alexgibson
Copy link
Member

So if i understand things correctly (?) #15010 seems like a bug, and the locales listed are from a different FTL file to the home page. That seems like something we should fix, but whether we should enable both of these two locales in production or not needs to come from our L10n team who I can talk to and ask.

@alexgibson
Copy link
Member

@janbrasna OK I had a chat with @peiying2. These two locales are not super active, so we want to make sure those communities are both willing and capable to keep up with translations before enabling them in production. This is the reason why they are not in the list yet. I will keep the DNM label on this PR until we hear back. Perhaps then we can at least enable Lao.

In the meantime, the bug you uncovered in #15010 can certainly be fixed i think.

Copy link
Member

@alexgibson alexgibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janbrasna we have the go ahead to enable Lao, so if you can remove Tajik for now we can merge this. Thanks

@janbrasna
Copy link
Contributor Author

So if i understand things correctly (?) #15010 seems like a bug, and the locales listed are from a different FTL file to the home page.

Yes, definitely, after further debugging that is one issue (and would rule out /tg from the root when fixed), the other being there are active_locales in the metadata that list more languages that are allowed in this list (that also needs improving otherwise /lo would still show up linked but end up 404 not found).

Now that the culprit is known and a PoC available it's perhaps more important closing that gap properly than just shipping a band-aid.

These two locales are not super active, so we want to make sure those communities are both willing and capable to keep up with translations before enabling them in production. This is the reason why they are not in the list yet.

Ah, yes, I was only checking the % amount of translated content and the number of authors or recent activity to compare these to some other already enabled locales (that mostly have less content translated), not factoring in that even when the other locales are now outdated and not keeping up (often barely having some headings and page title translated), they might have gotten a formal approval at some point in the past and these would need the same before being added.

(This only basically adds what is elsewhere in the configs: languages in pontoon.toml, or the folders pulled in from www-l10n — that seemingly needed to be in sync. Without these defined also here, the localisers e.g. won't have their language available in the dropdown on www-dev, that's referred from pontoon to check for their translations etc. — however as long as they don't meet the FLUENT_DEFAULT_PERCENT_REQUIRED they will only stay on www-dev anyways.)

@janbrasna
Copy link
Contributor Author

@alexgibson That's lovely, I think the locale deserves it for the work they've done on it. Could you keep the DNM for now though, as /lo would be (the only) one half of the test case for #15025, so I'd prefer to see the filtering in prod (as dev enables all by default) before this addition is merged? Thanks.

@alexgibson
Copy link
Member

alexgibson commented Aug 30, 2024

@janbrasna the only wrinkle is i don't know how long it might be until Tajik catches up. The Lao community manager has asked if we could enable their locale, so I'd like to get their translations live soon if possible. I think we can still verify your other bug adequately enough without this?

@janbrasna
Copy link
Contributor Author

(I meant it rather the other way around — if there are other sub-20%-complete translations, whether they should not also be removed from the enabled locales here in the config if they don't have basically any meaningful content translated besides one or two headings etc… /think: lv, tl, af, ach, xh etc. that are in the same list as enabled currently/…?)

@alexgibson With the current data being pulled from www-l10n, the only cases where the metadata don't match would be lo after the correct ftl file #15025 is used for active_locales, so 50% of that is getting the right metadata — and 50% is even with the right metadata additional filtering is needed. I think the reproduction was enough to outline the issue and the fix, it would just be nice to see the fix resolving both the culprits before the only repro path for 50% of its surface disappears. (You can always simulate that locally tho, messing up the enabled locales.)

@janbrasna janbrasna changed the title Fix Tajik and Lao languages Enable Lao localization Aug 30, 2024
@alexgibson
Copy link
Member

(I meant it rather the other way around — if there are other sub-20%-complete translations, whether they should not also be removed from the enabled locales here in the config if they don't have basically any meaningful content translated besides one or two headings etc… /think: lv, tl, af, ach, xh etc. that are in the same list as enabled currently/…?)

This is a much larger discussion, and not one that can be likely be agreed upon through GitHub I'm afraid. I think we're getting a bit too far from the scope of this bug.

@alexgibson With the current data being pulled from www-l10n, the only cases where the metadata don't match would be lo after the correct ftl file #15025 is used for active_locales, so 50% of that is getting the right metadata — and 50% is even with the right metadata additional filtering is needed. I think the reproduction was enough to outline the issue and the fix, it would just be nice to see the fix resolving both the culprits before the only repro path for 50% of its surface disappears. (You can always simulate that locally tho, messing up the enabled locales.)

We can leave this for a little while, but I think the fixes we make there can likely be made confident through added tests.

@janbrasna
Copy link
Contributor Author

Right, the whole activation threshold for a locale / its home ftl will need properly covering with tests within #15025 — and the intersection with allowed languages in settings (which is why I wanted to wait with this) is even easier to test at that point. So this should not be blocked on #15010 improvements 🤷 and the locales that have put the effort into making it more approachable to their communities can be published to prod once approved.

@janbrasna
Copy link
Contributor Author

In the meantime, perhaps after the inquiry into the commitment of keeping Tajik version of wwwmo up to date, the community got very active the following months, and went from ~20% to currently 60%+ of translated content, with the usual key pages as home, products, browsers, vpn etc. above the threshold for enabling the locale. So the question is, whether along with the approved Lao language, the Tajik locale wouldn't be also a worthwhile addition, given its recent activity.

@alexgibson
Copy link
Member

alexgibson commented Nov 15, 2024

@peiying2 can you please advise if we should enable Tajik here as well now, or if we should just go ahead an enable Lao? Thanks

(I think we can safely put the other bugs mentioned here aside from the discussion on whether these locales should be enabled in production)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants