Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.nl_langinfo(locale.ALT_DIGITS) does not work #124969

Closed
serhiy-storchaka opened this issue Oct 4, 2024 · 12 comments
Closed

locale.nl_langinfo(locale.ALT_DIGITS) does not work #124969

serhiy-storchaka opened this issue Oct 4, 2024 · 12 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Oct 4, 2024

ALT_DIGITS is a glibc specific item. It is supported by Python (there is an explicit list of supported items), but the result is not correct.

From The GNU C Library Reference Manual:

ALT_DIGITS

The return value is a representation of up to 100 values used to represent the values 0 to 99. As for ERA this value is not intended to be used directly, but instead indirectly through the strftime function. When the modifier O is used in a format which would otherwise use numerals to represent hours, minutes, seconds, weekdays, months, or weeks, the appropriate value for the locale is used instead.

This value is only defined in few locales: az_IR, fa_IR, ja_JP, lzh_TW, my_MM, or_IN, shn_MM.

But Python returns only one digit.

>>> import locale
>>> locale.setlocale(locale.LC_TIME, 'ja_JP')
'ja_JP'
>>> locale.setlocale(locale.LC_CTYPE, 'ja_JP')
'ja_JP'
>>> locale.nl_langinfo(locale.ALT_DIGITS)
'〇'

This is because nl_langinfo(ALT_DIGITS) in C returns a string with embedded null characters.

How should we fix it?

  • return a single string with 99 embedded null characters
  • return a 100-tuple of strings

What should we return if the value is not defined (in most locales) -- empty string (current behavior), empty tuple or None?

cc @methane

Linked PRs

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 4, 2024
@serhiy-storchaka serhiy-storchaka moved this from Todo to In Progress in Locale issues Oct 4, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 8, 2024
Now it returns a tuple of up to 100 strings (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
serhiy-storchaka added a commit that referenced this issue Oct 9, 2024
Now it returns a tuple of up to 100 strings (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
@encukou
Copy link
Member

encukou commented Oct 9, 2024

@freakboy3742: this fix broke the iOS buildbot. You might have opinions on how to handle this on non-glibc platforms :)

efimov-mikhail pushed a commit to efimov-mikhail/cpython that referenced this issue Oct 9, 2024
…124974)

Now it returns a tuple of up to 100 strings (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 9, 2024
@serhiy-storchaka
Copy link
Member Author

It is interesting that ALT_DIGITS is a FreeBSD extension. And was inherited on Apple platforms. I wonder whether it is not set only for Japanese or for other locales (not installed on buildbots) too? fa_IR, lzh_TW, my_MM, or_IN, shn_MM?

freakboy3742 pushed a commit that referenced this issue Oct 9, 2024
Skip the locale.ALT_DIGITS test on all Apple platforms, not just macOS.
@freakboy3742
Copy link
Contributor

@serhiy-storchaka AFAICT, jp_JP is the only locale in test test set that is defined, and it isn't defining a value for ALT_DIGITS. The others fall back as an unrecognised locale. Whether that can be fixed by installing something else on the buildbot/test machine, I can't say - but I haven't explicitly installed a JP locale on my machine.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 10, 2024
…thonGH-124974)

Now it returns a tuple of up to 100 strings (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
(cherry picked from commit 21c04e1)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 10, 2024
…thonGH-124974)

Now it returns a tuple of up to 100 strings (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
(cherry picked from commit 21c04e1)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@JacobCoffee JacobCoffee added the type-bug An unexpected behavior, bug, or error label Oct 10, 2024
freakboy3742 pushed a commit that referenced this issue Oct 10, 2024
… (#125232)

Returns a tuple of up to 100 strings for ALT_DIGITS lookup (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
(cherry picked from commit 21c04e1)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 10, 2024
…thonGH-124974) (pythonGH-125232)

(cherry picked from commit 26a9318)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Returns a tuple of up to 100 strings for ALT_DIGITS lookup (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
(cherry picked from commit 21c04e1)
serhiy-storchaka added a commit that referenced this issue Oct 11, 2024
…H-124974) (GH-125232) (GH-125284)

(cherry picked from commit 26a9318)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Returns a tuple of up to 100 strings for ALT_DIGITS lookup (an empty tuple on most locales).
Previously it returned the first item of that tuple or an empty string.
(cherry picked from commit 21c04e1)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@github-project-automation github-project-automation bot moved this from In Progress to Done in Locale issues Oct 11, 2024
@serhiy-storchaka
Copy link
Member Author

Now it could be used in the strptime() implementation (#53161).

@encukou
Copy link
Member

encukou commented Oct 11, 2024

The 3.12 buildbot now fails: https://buildbot.python.org/#/builders/1185/builds/971

@encukou encukou reopened this Oct 11, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in Locale issues Oct 11, 2024
@serhiy-storchaka
Copy link
Member Author

Fixed by #125311.

@kulikjak
Copy link
Contributor

Hi, correct me if I am wrong, but I don't think that ALT_DIGITS is a glibc specific item. I found it here:
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html
and it exists for example on Solaris/Illumos and FreeBSD as well (few I tested).

As it's implemented now, it's very Linux specific.

On Solaris, the newly added test is failing for two reasons:

  1. different locales have alt_digits defined
  2. the strings are semicolon-separated (as the specification above says they should) rather than NULL separated

On FreeBSD, I didn't find any locale that had alt_digits defined (so the tests will likely fail as well?), but I don't know about the format it returns.

I can look into the semicolon separated string parsing for platforms where that is the case.

@github-project-automation github-project-automation bot moved this from Done to In Progress in Locale issues Oct 15, 2024
@serhiy-storchaka
Copy link
Member Author

Thank you for pointing on this @kulikjak. It was used in Python sources as optionally defined, and it was not even documented in Linux manpages, so I thought that it is a unofficial extension.

Does it mean that locale.nl_langinfo(locale.ALT_DIGITS) now crashes on Solaris? What was the output of the following script on still working Python?

import locale, subprocess
alllocales = subprocess.check_output(['locale', '-a']).decode().split()
for loc in alllocales:
    if '.' in loc or '@' in loc:
        continue
    try:
        _ = locale.setlocale(locale.LC_ALL, loc)
    except locale.Error:
        continue
    alt_digits = locale.nl_langinfo(locale.ALT_DIGITS)
    if alt_digits:
        print(loc, len(alt_digits), alt_digits)

On Linux I now get

az_IR 100 ('۰۰', '۰۱', '۰۲', '۰۳', '۰۴', '۰۵', '۰۶', '۰۷', '۰۸', '۰۹', '۱۰', '۱۱', '۱۲', '۱۳', '۱۴', '۱۵', '۱۶', '۱۷', '۱۸', '۱۹', '۲۰', '۲۱', '۲۲', '۲۳', '۲۴', '۲۵', '۲۶', '۲۷', '۲۸', '۲۹', '۳۰', '۳۱', '۳۲', '۳۳', '۳۴', '۳۵', '۳۶', '۳۷', '۳۸', '۳۹', '۴۰', '۴۱', '۴۲', '۴۳', '۴۴', '۴۵', '۴۶', '۴۷', '۴۸', '۴۹', '۵۰', '۵۱', '۵۲', '۵۳', '۵۴', '۵۵', '۵۶', '۵۷', '۵۸', '۵۹', '۶۰', '۶۱', '۶۲', '۶۳', '۶۴', '۶۵', '۶۶', '۶۷', '۶۸', '۶۹', '۷۰', '۷۱', '۷۲', '۷۳', '۷۴', '۷۵', '۷۶', '۷۷', '۷۸', '۷۹', '۸۰', '۸۱', '۸۲', '۸۳', '۸۴', '۸۵', '۸۶', '۸۷', '۸۸', '۸۹', '۹۰', '۹۱', '۹۲', '۹۳', '۹۴', '۹۵', '۹۶', '۹۷', '۹۸', '۹۹')
fa_IR 100 ('۰۰', '۰۱', '۰۲', '۰۳', '۰۴', '۰۵', '۰۶', '۰۷', '۰۸', '۰۹', '۱۰', '۱۱', '۱۲', '۱۳', '۱۴', '۱۵', '۱۶', '۱۷', '۱۸', '۱۹', '۲۰', '۲۱', '۲۲', '۲۳', '۲۴', '۲۵', '۲۶', '۲۷', '۲۸', '۲۹', '۳۰', '۳۱', '۳۲', '۳۳', '۳۴', '۳۵', '۳۶', '۳۷', '۳۸', '۳۹', '۴۰', '۴۱', '۴۲', '۴۳', '۴۴', '۴۵', '۴۶', '۴۷', '۴۸', '۴۹', '۵۰', '۵۱', '۵۲', '۵۳', '۵۴', '۵۵', '۵۶', '۵۷', '۵۸', '۵۹', '۶۰', '۶۱', '۶۲', '۶۳', '۶۴', '۶۵', '۶۶', '۶۷', '۶۸', '۶۹', '۷۰', '۷۱', '۷۲', '۷۳', '۷۴', '۷۵', '۷۶', '۷۷', '۷۸', '۷۹', '۸۰', '۸۱', '۸۲', '۸۳', '۸۴', '۸۵', '۸۶', '۸۷', '۸۸', '۸۹', '۹۰', '۹۱', '۹۲', '۹۳', '۹۴', '۹۵', '۹۶', '۹۷', '۹۸', '۹۹')
ja_JP 100 ('〇', '一', '二', '三', '四', '五', '六', '七', '八', '九', '十', '十一', '十二', '十三', '十四', '十五', '十六', '十七', '十八', '十九', '二十', '二十一', '二十二', '二十三', '二十四', '二十五', '二十六', '二十七', '二十八', '二十九', '三十', '三十一', '三十二', '三十三', '三十四', '三十五', '三十六', '三十七', '三十八', '三十九', '四十', '四十一', '四十二', '四十三', '四十四', '四十五', '四十六', '四十七', '四十八', '四十九', '五十', '五十一', '五十二', '五十三', '五十四', '五十五', '五十六', '五十七', '五十八', '五十九', '六十', '六十一', '六十二', '六十三', '六十四', '六十五', '六十六', '六十七', '六十八', '六十九', '七十', '七十一', '七十二', '七十三', '七十四', '七十五', '七十六', '七十七', '七十八', '七十九', '八十', '八十一', '八十二', '八十三', '八十四', '八十五', '八十六', '八十七', '八十八', '八十九', '九十', '九十一', '九十二', '九十三', '九十四', '九十五', '九十六', '九十七', '九十八', '九十九')
japanese 100 ('〇', '一', '二', '三', '四', '五', '六', '七', '八', '九', '十', '十一', '十二', '十三', '十四', '十五', '十六', '十七', '十八', '十九', '二十', '二十一', '二十二', '二十三', '二十四', '二十五', '二十六', '二十七', '二十八', '二十九', '三十', '三十一', '三十二', '三十三', '三十四', '三十五', '三十六', '三十七', '三十八', '三十九', '四十', '四十一', '四十二', '四十三', '四十四', '四十五', '四十六', '四十七', '四十八', '四十九', '五十', '五十一', '五十二', '五十三', '五十四', '五十五', '五十六', '五十七', '五十八', '五十九', '六十', '六十一', '六十二', '六十三', '六十四', '六十五', '六十六', '六十七', '六十八', '六十九', '七十', '七十一', '七十二', '七十三', '七十四', '七十五', '七十六', '七十七', '七十八', '七十九', '八十', '八十一', '八十二', '八十三', '八十四', '八十五', '八十六', '八十七', '八十八', '八十九', '九十', '九十一', '九十二', '九十三', '九十四', '九十五', '九十六', '九十七', '九十八', '九十九')
lzh_TW 32 ('〇', '一', '二', '三', '四', '五', '六', '七', '八', '九', '十', '十一', '十二', '十三', '十四', '十五', '十六', '十七', '十八', '十九', '廿', '廿一', '廿二', '廿三', '廿四', '廿五', '廿六', '廿七', '廿八', '廿九', '卅', '卅一')
my_MM 100 ('၀၀', '၀၁', '၀၂', '၀၃', '၀၄', '၀၅', '၀၆', '၀၇', '၀၈', '၀၉', '၁၀', '၁၁', '၁၂', '၁၃', '၁၄', '၁၅', '၁၆', '၁၇', '၁၈', '၁၉', '၂၀', '၂၁', '၂၂', '၂၃', '၂၄', '၂၅', '၂၆', '၂၇', '၂၈', '၂၉', '၃၀', '၃၁', '၃၂', '၃၃', '၃၄', '၃၅', '၃၆', '၃၇', '၃၈', '၃၉', '၄၀', '၄၁', '၄၂', '၄၃', '၄၄', '၄၅', '၄၆', '၄၇', '၄၈', '၄၉', '၅၀', '၅၁', '၅၂', '၅၃', '၅၄', '၅၅', '၅၆', '၅၇', '၅၈', '၅၉', '၆၀', '၆၁', '၆၂', '၆၃', '၆၄', '၆၅', '၆၆', '၆၇', '၆၈', '၆၉', '၇၀', '၇၁', '၇၂', '၇၃', '၇၄', '၇၅', '၇၆', '၇၇', '၇၈', '၇၉', '၈၀', '၈၁', '၈၂', '၈၃', '၈၄', '၈၅', '၈၆', '၈၇', '၈၈', '၈၉', '၉၀', '၉၁', '၉၂', '၉၃', '၉၄', '၉၅', '၉၆', '၉၇', '၉၈', '၉၉')
or_IN 100 ('୦', '୧', '୨', '୩', '୪', '୫', '୬', '୭', '୮', '୯', '୧୦', '୧୧', '୧୨', '୧୩', '୧୪', '୧୫', '୧୬', '୧୭', '୧୮', '୧୯', '୨୦', '୨୧', '୨୨', '୨୩', '୨୪', '୨୫', '୨୬', '୨୭', '୨୮', '୨୯', '୩୦', '୩୧', '୩୨', '୩୩', '୩୪', '୩୫', '୩୬', '୩୭', '୩୮', '୩୯', '୪୦', '୪୧', '୪୨', '୪୩', '୪୪', '୪୫', '୪୬', '୪୭', '୪୮', '୪୯', '୫୦', '୫୧', '୫୨', '୫୩', '୫୪', '୫୫', '୫୬', '୫୭', '୫୮', '୫୯', '୬୦', '୬୧', '୬୨', '୬୩', '୬୪', '୬୫', '୬୬', '୬୭', '୬୮', '୬୯', '୭୦', '୭୧', '୭୨', '୭୩', '୭୪', '୭୫', '୭୬', '୭୭', '୭୮', '୭୯', '୮୦', '୮୧', '୮୨', '୮୩', '୮୪', '୮୫', '୮୬', '୮୭', '୮୮', '୮୯', '୯୦', '୯୧', '୯୨', '୯୩', '୯୪', '୯୫', '୯୬', '୯୭', '୯୮', '୯୯')
shn_MM 100 ('႐႐', '႐႑', '႐႒', '႐႓', '႐႔', '႐႕', '႐႖', '႐႗', '႐႘', '႐႙', '႑႐', '႑႑', '႑႒', '႑႓', '႑႔', '႑႕', '႑႖', '႑႗', '႑႘', '႑႙', '႒႐', '႒႑', '႒႒', '႒႓', '႒႔', '႒႕', '႒႖', '႒႗', '႒႘', '႒႙', '႓႐', '႓႑', '႓႒', '႓႓', '႓႔', '႓႕', '႓႖', '႓႗', '႓႘', '႓႙', '႔႐', '႔႑', '႔႒', '႔႓', '႔႔', '႔႕', '႔႖', '႔႗', '႔႘', '႔႙', '႕႐', '႕႑', '႕႒', '႕႓', '႕႔', '႕႕', '႕႖', '႕႗', '႕႘', '႕႙', '႖႐', '႖႑', '႖႒', '႖႓', '႖႔', '႖႕', '႖႖', '႖႗', '႖႘', '႖႙', '႗႐', '႗႑', '႗႒', '႗႓', '႗႔', '႗႕', '႗႖', '႗႗', '႗႘', '႗႙', '႘႐', '႘႑', '႘႒', '႘႓', '႘႔', '႘႕', '႘႖', '႘႗', '႘႘', '႘႙', '႙႐', '႙႑', '႙႒', '႙႓', '႙႔', '႙႕', '႙႖', '႙႗', '႙႘', '႙႙')

@kulikjak
Copy link
Contributor

... I thought that it is a unofficial extension.

Oh I get that - I didn't even know it existed before the test started failing :).

Does it mean that locale.nl_langinfo(locale.ALT_DIGITS) now crashes on Solaris? What was the output of the following script on still working Python?

It doesn't crash, only the test fails with:

Traceback (most recent call last):
  File "/..../Lib/test/test__locale.py", line 214, in test_alt_digits_nl_langinfo
    self.assertEqual(len(alt_digits), count)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 0 != 100

I ran your script (with the '.' check removed as most of our locales have .UTF-8 or other suffix) on latest 3.13 and the output is:

ar_AE.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_BH.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_EG.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_IQ.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_JO.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_KW.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_OM.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_QA.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_SA.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
ar_YE.UTF-8 1 ('٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩',)
as_IN.UTF-8 1 ('০;১;২;৩;৪;৫;৬;৭;৮;৯;১০;১১;১২;১৩;১৪;১৫;১৬;১৭;১৮;১৯;২০;২১;২২;২৩;২৪;২৫;২৬;২৭;২৮;২৯;৩০;৩১;৩২;৩৩;৩৪;৩৫;৩৬;৩৭;৩৮;৩৯;৪০;৪১;৪২;৪৩;৪৪;৪৫;৪৬;৪৭;৪৮;৪৯;৫০;৫১;৫২;৫৩;৫৪;৫৫;৫৬;৫৭;৫৮;৫৯;৬০;৬১;৬২;৬৩;৬৪;৬৫;৬৬;৬৭;৬৮;৬৯;৭০;৭১;৭২;৭৩;৭৪;৭৫;৭৬;৭৭;৭৮;৭৯;৮০;৮১;৮২;৮৩;৮৪;৮৫;৮৬;৮৭;৮৮;৮৯;৯০;৯১;৯২;৯৩;৯৪;৯৫;৯৬;৯৭;৯৮;৯৯',)
bn_IN.UTF-8 1 ('০;১;২;৩;৪;৫;৬;৭;৮;৯;১০;১১;১২;১৩;১৪;১৫;১৬;১৭;১৮;১৯;২০;২১;২২;২৩;২৪;২৫;২৬;২৭;২৮;২৯;৩০;৩১;৩২;৩৩;৩৪;৩৫;৩৬;৩৭;৩৮;৩৯;৪০;৪১;৪২;৪৩;৪৪;৪৫;৪৬;৪৭;৪৮;৪৯;৫০;৫১;৫২;৫৩;৫৪;৫৫;৫৬;৫৭;৫৮;৫৯;৬০;৬১;৬২;৬৩;৬৪;৬৫;৬৬;৬৭;৬৮;৬৯;৭০;৭১;৭২;৭৩;৭৪;৭৫;৭৬;৭৭;৭৮;৭৯;৮০;৮১;৮২;৮৩;৮৪;৮৫;৮৬;৮৭;৮৮;৮৯;৯০;৯১;৯২;৯৩;৯৪;৯৫;৯৬;৯৭;৯৮;৯৯',)
ja_JP.PCK 1 ('零;一;二;三;四;五;六;七;八;九;十;十一;十二;十三;十四;十五;十六;十七;十八;十九;二十;二十一;二十二;二十三;二十四;二十五;二十六;二十七;二十八;二十九;三十;三十一;三十二;三十三;三十四;三十五;三十六;三十七;三十八;三十九;四十;四十一;四十二;四十三;四十四;四十五;四十六;四十七;四十八;四十九;五十;五十一;五十二;五十三;五十四;五十五;五十六;五十七;五十八;五十九;六十;六十一;六十二;六十三;六十四;六十五;六十六;六十七;六十八;六十九;七十;七十一;七十二;七十三;七十四;七十五;七十六;七十七;七十八;七十九;八十;八十一;八十二;八十三;八十四;八十五;八十六;八十七;八十八;八十九;九十;九十一;九十二;九十三;九十四;九十五;九十六;九十七;九十八;九十九;百',)
ks_IN.UTF-8 1 ('۰;۱;۲;۳;۴;۵;۶;۷;۸;۹;۱۰;۱۱;۱۲;۱۳;۱۴;۱۵;۱۶;۱۷;۱۸;۱۹;۲۰;۲۱;۲۲;۲۳;۲۴;۲۵;۲۶;۲۷;۲۸;۲۹;۳۰;۳۱;۳۲;۳۳;۳۴;۳۵;۳۶;۳۷;۳۸;۳۹;۴۰;۴۱;۴۲;۴۳;۴۴;۴۵;۴۶;۴۷;۴۸;۴۹;۵۰;۵۱;۵۲;۵۳;۵۴;۵۵;۵۶;۵۷;۵۸;۵۹;۶۰;۶۱;۶۲;۶۳;۶۴;۶۵;۶۶;۶۷;۶۸;۶۹;۷۰;۷۱;۷۲;۷۳;۷۴;۷۵;۷۶;۷۷;۷۸;۷۹;۸۰;۸۱;۸۲;۸۳;۸۴;۸۵;۸۶;۸۷;۸۸;۸۹;۹۰;۹۱;۹۲;۹۳;۹۴;۹۵;۹۶;۹۷;۹۸;۹۹',)
mr_IN.UTF-8 1 ('०;१;२;३;४;५;६;७;८;९;१०;११;१२;१३;१४;१५;१६;१७;१८;१९;२०;२१;२२;२३;२४;२५;२६;२७;२८;२९;३०;३१;३२;३३;३४;३५;३६;३७;३८;३९;४०;४१;४२;४३;४४;४५;४६;४७;४८;४९;५०;५१;५२;५३;५४;५५;५६;५७;५८;५९;६०;६१;६२;६३;६४;६५;६६;६७;६८;६९;७०;७१;७२;७३;७४;७५;७६;७७;७८;७९;८०;८१;८२;८३;८४;८५;८६;८७;८८;८९;९०;९१;९२;९३;९४;९५;९६;९७;९८;९९',)
th_TH.TIS620 1 ('๐;๑;๒;๓;๔;๕;๖;๗;๘;๙',)
zh_CN.GB18030 1 ('零;一;二;三;四;五;六;七;八;九;十;十一;十二;十三;十四;十五;十六;十七;十八;十九;二十;二十一;二十二;二十三;二十四;二十五;二十六;二十七;二十八;二十九;三十;三十一;三十二;三十三;三十四;三十五;三十六;三十七;三十八;三十九;四十;四十一;四十二;四十三;四十四;四十五;四十六;四十七;四十八;四十九;五十;五十一;五十二;五十三;五十四;五十五;五十六;五十七;五十八;五十九;六十;六十一;六十二;六十三;六十四;六十五;六十六;六十七;六十八;六十九;七十;七十一;七十二;七十三;七十四;七十五;七十六;七十七;七十八;七十九;八十;八十一;八十二;八十三;八十四;八十五;八十六;八十七;八十八;八十九;九十;九十一;九十二;九十三;九十四;九十五;九十六;九十七;九十八;九十九;百',)
zh_HK.BIG5HK 1 ('零;一;二;三;四;五;六;七;八;九;十;十一;十二;十三;十四;十五;十六;十七;十八;十九;二十;二十一;二十二;二十三;二十四;二十五;二十六;二十七;二十八;二十九;三十;三十一;三十二;三十三;三十四;三十五;三十六;三十七;三十八;三十九;四十;四十一;四十二;四十三;四十四;四十五;四十六;四十七;四十八;四十九;五十;五十一;五十二;五十三;五十四;五十五;五十六;五十七;五十八;五十九;六十;六十一;六十二;六十三;六十四;六十五;六十六;六十七;六十八;六十九;七十;七十一;七十二;七十三;七十四;七十五;七十六;七十七;七十八;七十九;八十;八十一;八十二;八十三;八十四;八十五;八十六;八十七;八十八;八十九;九十;九十一;九十二;九十三;九十四;九十五;九十六;九十七;九十八;九十九;百',)

Before, it looked like this (3.13.0):

ar_AE.UTF-8 289 ٠;١;٢;٣;٤;٥;٦;٧;٨;٩;١٠;١١;١٢;١٣;١٤;١٥;١٦;١٧;١٨;١٩;٢٠;٢١;٢٢;٢٣;٢٤;٢٥;٢٦;٢٧;٢٨;٢٩;٣٠;٣١;٣٢;٣٣;٣٤;٣٥;٣٦;٣٧;٣٨;٣٩;٤٠;٤١;٤٢;٤٣;٤٤;٤٥;٤٦;٤٧;٤٨;٤٩;٥٠;٥١;٥٢;٥٣;٥٤;٥٥;٥٦;٥٧;٥٨;٥٩;٦٠;٦١;٦٢;٦٣;٦٤;٦٥;٦٦;٦٧;٦٨;٦٩;٧٠;٧١;٧٢;٧٣;٧٤;٧٥;٧٦;٧٧;٧٨;٧٩;٨٠;٨١;٨٢;٨٣;٨٤;٨٥;٨٦;٨٧;٨٨;٨٩;٩٠;٩١;٩٢;٩٣;٩٤;٩٥;٩٦;٩٧;٩٨;٩٩
...

@serhiy-storchaka
Copy link
Member Author

Thank you. Indeed, this looks like semicolon separated correct values.

locale.nl_langinfo(locale.ALT_DIGITS) always returned an empty result on *BSD and macOS, and on Windows it just is not implemented, so Linux was the only tier 1 platform on which it worked, but was broken. But since the POSIX specification says that values should be semicolon separated, it seems, that glibc is broken. I understand why they did this, it is more convenient for the C code. On Python side, it would be more convenient to return a tuple, but this is breaking change on Solaris, the only platform that implemented it right. So I think that we should fix the glibc implementation and return a semicolon separated string. I hope that no locale uses semicolon as an alternative digit or in the ERA items.

@kulikjak
Copy link
Contributor

Makes sense, although I think that if we want the tuple (which is indeed nicer on the Python side), we can implement alternative parsing (for Solaris and possibly other more compliant platforms) that uses ; rather than \0 as a separator as well.

@serhiy-storchaka
Copy link
Member Author

It is interesting that a similar issue was fixed in Perl at the beginning of this year (Perl/perl5#21833). They return a single string, replacing null characters with semicolons. Actually, they support also other separators, but I do not think there is a reason to do this.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 21, 2024
… a string again

This is a follow up of pythonGH-124974. Only Glibc needed a fix.
Now the returned value is a string consisting of semicolon-separated
symbols on all Posix platforms.
serhiy-storchaka added a commit that referenced this issue Oct 21, 2024
…ing again (GH-125774)

This is a follow up of GH-124974. Only Glibc needed a fix.
Now the returned value is a string consisting of semicolon-separated
symbols on all Posix platforms.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 21, 2024
… a string again (pythonGH-125774)

This is a follow up of pythonGH-124974. Only Glibc needed a fix.
Now the returned value is a string consisting of semicolon-separated
symbols on all Posix platforms.
(cherry picked from commit dcc4fb2)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 21, 2024
… a string again (pythonGH-125774)

This is a follow up of pythonGH-124974. Only Glibc needed a fix.
Now the returned value is a string consisting of semicolon-separated
symbols on all Posix platforms.
(cherry picked from commit dcc4fb2)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@github-project-automation github-project-automation bot moved this from In Progress to Done in Locale issues Oct 21, 2024
serhiy-storchaka added a commit that referenced this issue Oct 21, 2024
…g a string again (GH-125774) (GH-125805)

This is a follow up of GH-124974. Only Glibc needed a fix.
Now the returned value is a string consisting of semicolon-separated
symbols on all Posix platforms.
(cherry picked from commit dcc4fb2)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit that referenced this issue Oct 21, 2024
…g a string again (GH-125774) (GH-125804)

This is a follow up of GH-124974. Only Glibc needed a fix.
Now the returned value is a string consisting of semicolon-separated
symbols on all Posix platforms.
(cherry picked from commit dcc4fb2)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

5 participants