Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling for pt_BR and pt_PT #125

Closed
MasterOdin opened this issue Jun 21, 2020 · 5 comments
Closed

Fix handling for pt_BR and pt_PT #125

MasterOdin opened this issue Jun 21, 2020 · 5 comments
Milestone

Comments

@MasterOdin
Copy link
Collaborator

While most languages only use the ll bit of the ll_CC language specification, Portuguese is split using the full code to differentiate between Brazilian and Portugal Portuguese, which is currently not supported by the client.

@zlatanvasovic
Copy link
Contributor

It doesn't accept pt_BR as specified language?

@MasterOdin
Copy link
Collaborator Author

Nope, we do a split on _ and take element [0]. Once tldr-pages/tldr#4101 is merged, will work on a fix for this.

@MasterOdin MasterOdin added this to the 1.1.0 milestone Aug 1, 2020
@columbarius
Copy link
Contributor

columbarius commented Aug 3, 2020

@MasterOdin
Have you started to work on this issue? I tried myself on this master...columbarius:portuguese but couldn't find an elegant solution to create a list with both long and short language codes and squash it, removing duplicates.

@MasterOdin
Copy link
Collaborator Author

There are two parts to this. The first is that we need to rewrite the language setting stuff such that we do not trim down to just the two character code in all cases (so we retain if someone set pt_PT or pt_BR). The second is that we then would want to make it efficient such that we minimize the amount of searching we do, especially if each possible page is a network fetch till we find the one we want.

For both, I almost think the best solution (imo) would be to fetch the index.json file (https://tldr.sh/assets/index.json) and then use that to determine the list of possible platform and languages to search through. However, I think we can also (right now) bypass this step by just operating on the simple belief that except for pt_PT, pt_BR, and zh_TW, there are no other languages in the form of ll_LL and that only the shorthand pt should expand into pt_PT. There's only a handful of additional ll_LL codes I would envision the tldr-pages project getting in the future.

As such, I would probably modify things such that we change the get_language_list() function to be something like:

def get_language_list():
    lang = os.environ.get('LANG', '')
    languages = os.environ.get('LANGUAGE', '').split(':')

    # per spec, this needs to be set to something other than C or empty, to
    # enable languages
    if lang in ['C', '']:
        return 'en'
    
    languages.append(lang)
    languages.append('en')

    final_set = []
    for lang in languages:
        if lang in ['POSIX', 'C', '']:
            continue
        if lang == 'pt':
            lang = 'pt_PT'
        if lang not in ['pt_PT', 'pt_BR', 'zh_TW']:
            lang = lang.split('_')[0]
        final_set.append(lang)
    return final_set

@columbarius
Copy link
Contributor

I guess for these 3 cases a simple case handling is sufficient.

I would also prefer your solution, but to parse the json, i would prefer a jsonpath library. The only widespread available I could found would be jsonpath-rw, which is available on almost all mainstream distros except arch, where it is available in the aur.

Your sketch would be the most viable option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants