Fix handling for pt_BR and pt_PT #125

MasterOdin · 2020-06-21T20:40:04Z

While most languages only use the ll bit of the ll_CC language specification, Portuguese is split using the full code to differentiate between Brazilian and Portugal Portuguese, which is currently not supported by the client.

The text was updated successfully, but these errors were encountered:

zlatanvasovic · 2020-06-21T21:13:55Z

It doesn't accept pt_BR as specified language?

MasterOdin · 2020-06-21T21:49:01Z

Nope, we do a split on _ and take element [0]. Once tldr-pages/tldr#4101 is merged, will work on a fix for this.

columbarius · 2020-08-03T14:15:39Z

@MasterOdin
Have you started to work on this issue? I tried myself on this master...columbarius:portuguese but couldn't find an elegant solution to create a list with both long and short language codes and squash it, removing duplicates.

MasterOdin · 2020-08-03T14:51:57Z

There are two parts to this. The first is that we need to rewrite the language setting stuff such that we do not trim down to just the two character code in all cases (so we retain if someone set pt_PT or pt_BR). The second is that we then would want to make it efficient such that we minimize the amount of searching we do, especially if each possible page is a network fetch till we find the one we want.

For both, I almost think the best solution (imo) would be to fetch the index.json file (https://tldr.sh/assets/index.json) and then use that to determine the list of possible platform and languages to search through. However, I think we can also (right now) bypass this step by just operating on the simple belief that except for pt_PT, pt_BR, and zh_TW, there are no other languages in the form of ll_LL and that only the shorthand pt should expand into pt_PT. There's only a handful of additional ll_LL codes I would envision the tldr-pages project getting in the future.

As such, I would probably modify things such that we change the get_language_list() function to be something like:

def get_language_list():
    lang = os.environ.get('LANG', '')
    languages = os.environ.get('LANGUAGE', '').split(':')

    # per spec, this needs to be set to something other than C or empty, to
    # enable languages
    if lang in ['C', '']:
        return 'en'
    
    languages.append(lang)
    languages.append('en')

    final_set = []
    for lang in languages:
        if lang in ['POSIX', 'C', '']:
            continue
        if lang == 'pt':
            lang = 'pt_PT'
        if lang not in ['pt_PT', 'pt_BR', 'zh_TW']:
            lang = lang.split('_')[0]
        final_set.append(lang)
    return final_set

columbarius · 2020-08-03T22:56:38Z

I guess for these 3 cases a simple case handling is sufficient.

I would also prefer your solution, but to parse the json, i would prefer a jsonpath library. The only widespread available I could found would be jsonpath-rw, which is available on almost all mainstream distros except arch, where it is available in the aur.

Your sketch would be the most viable option.

MasterOdin added this to the 1.1.0 milestone Aug 1, 2020

columbarius mentioned this issue Aug 7, 2020

Add filter for languages with ll_CC description #133

Merged

MasterOdin closed this as completed Sep 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling for pt_BR and pt_PT #125

Fix handling for pt_BR and pt_PT #125

MasterOdin commented Jun 21, 2020

zlatanvasovic commented Jun 21, 2020

MasterOdin commented Jun 21, 2020

columbarius commented Aug 3, 2020 •

edited

Loading

MasterOdin commented Aug 3, 2020

columbarius commented Aug 3, 2020

Fix handling for pt_BR and pt_PT #125

Fix handling for pt_BR and pt_PT #125

Comments

MasterOdin commented Jun 21, 2020

zlatanvasovic commented Jun 21, 2020

MasterOdin commented Jun 21, 2020

columbarius commented Aug 3, 2020 • edited Loading

MasterOdin commented Aug 3, 2020

columbarius commented Aug 3, 2020

columbarius commented Aug 3, 2020 •

edited

Loading