Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locale.nl_langinfo(locale.ERA) does not work for past eras #126727

Closed
serhiy-storchaka opened this issue Nov 12, 2024 · 2 comments
Closed

locale.nl_langinfo(locale.ERA) does not work for past eras #126727

serhiy-storchaka opened this issue Nov 12, 2024 · 2 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Nov 12, 2024

Bug report

According to the Posix specification (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_05_02), nl_langinfo(ERA) should return a string containing semicolon separated era description segments. But in Glibc it uses NUL instead of a semicolon as a separator. As result, locale.nl_langinfo(locale.ERA) in Python only returns the first segment, corresponding to the last (current) era. For example, in Japanese locale the result cannot be used for data before year 2020:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'ja_JP')
'ja_JP'
>>> locale.nl_langinfo(locale.ERA)
'+:2:2020/01/01:+*:令和:%EC%Ey年'

This issue is similar to #124969, but at least the result can be used for the current date.

cc @methane, @kulikjak.

Linked PRs

@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Nov 12, 2024
@serhiy-storchaka serhiy-storchaka moved this from Todo to In Progress in Locale issues Nov 12, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 12, 2024
It now returns multiple era description segments separated by semicolons.
Previously it only returned the first segment on platforms with Glibc.
@serhiy-storchaka
Copy link
Member Author

On my computer (Linux) the following script

import locale, subprocess
alllocales = subprocess.check_output(['locale', '-a']).decode().split()
for loc in alllocales:
    if '.' in loc or '@' in loc:
        continue
    try:
        _ = locale.setlocale(locale.LC_ALL, loc)
    except locale.Error:
        continue
    era = locale.nl_langinfo(locale.ERA)
    if era:
        print(loc, era.count(';'), era)

now produces the following output:

cmn_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
hak_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
ja_JP 10 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年;+:1:1926/12/25:1926/12/31:昭和:%EC元年;+:2:1913/01/01:1926/12/24:大正:%EC%Ey年;+:1:1912/07/30:1912/12/31:大正:%EC元年;+:6:1873/01/01:1912/07/29:明治:%EC%Ey年;+:1:0001/01/01:1872/12/31:西暦:%EC%Ey年;+:1:-0001/12/31:-*:紀元前:%EC%Ey年
japanese 10 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年;+:1:1926/12/25:1926/12/31:昭和:%EC元年;+:2:1913/01/01:1926/12/24:大正:%EC%Ey年;+:1:1912/07/30:1912/12/31:大正:%EC元年;+:6:1873/01/01:1912/07/29:明治:%EC%Ey年;+:1:0001/01/01:1872/12/31:西暦:%EC%Ey年;+:1:-0001/12/31:-*:紀元前:%EC%Ey年
lo_LA 0 +:1:-543/01/01:+*:ພ.ສ.:%EC %Ey
lzh_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
nan_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年
thai 0 +:1:-543/01/01:+*:พ.ศ.:%EC %Ey
th_TH 0 +:1:-543/01/01:+*:พ.ศ.:%EC %Ey
zh_TW 2 +:2:1913/01/01:+*:民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:民國:%EC元年;+:1:1911/12/31:-*:民前:%EC%Ey年

The ERA values are not set on FreeBSD and Illumos, and I suppose on macOS and Solaris too. It seems that currently they are only set on Linux.

serhiy-storchaka added a commit that referenced this issue Nov 21, 2024
It now returns multiple era description segments separated by semicolons.
Previously it only returned the first segment on platforms with Glibc.
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 21, 2024
…126730)

It now returns multiple era description segments separated by semicolons.
Previously it only returned the first segment on platforms with Glibc.
(cherry picked from commit 4803cd0)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 21, 2024
…126730)

It now returns multiple era description segments separated by semicolons.
Previously it only returned the first segment on platforms with Glibc.
(cherry picked from commit 4803cd0)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@github-project-automation github-project-automation bot moved this from In Progress to Done in Locale issues Nov 21, 2024
serhiy-storchaka added a commit that referenced this issue Nov 21, 2024
…127098)

It now returns multiple era description segments separated by semicolons.
Previously it only returned the first segment on platforms with Glibc.
(cherry picked from commit 4803cd0)
serhiy-storchaka added a commit that referenced this issue Nov 21, 2024
…127097)

It now returns multiple era description segments separated by semicolons.
Previously it only returned the first segment on platforms with Glibc.
(cherry picked from commit 4803cd0)
@kulikjak
Copy link
Contributor

Hi @serhiy-storchaka, I am sorry I couldn't test this earlier.

While they are indeed not set on Illumos, we have them on Oracle Solaris for some locales:

ja_JP.PCK 4 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年
ja_JP.UTF-8 4 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年
ja_JP.eucJP 4 +:2:2020/01/01:+*:令和:%EC%Ey年;+:1:2019/05/01:2019/12/31:令和:%EC元年;+:2:1990/01/01:2019/04/30:平成:%EC%Ey年;+:1:1989/01/08:1989/12/31:平成:%EC元年;+:2:1927/01/01:1989/01/07:昭和:%EC%Ey年
th_TH.TIS620 0 +:0:-543/01/01:+*::พ.ศ. %Ey
zh_CN.GB18030 1 +:0:0000/01/01:+*:公元:%EC%Ey年;+:1:-0001/12/31:-*:公元前:%EC%Ey年
zh_CN.GB18030@pinyin 1 +:0:0000/01/01:+*:公元:%EC%Ey年;+:1:-0001/12/31:-*:公元前:%EC%Ey年
zh_CN.GB18030@radical 1 +:0:0000/01/01:+*:公元:%EC%Ey年;+:1:-0001/12/31:-*:公元前:%EC%Ey年
zh_CN.GB18030@stroke 1 +:0:0000/01/01:+*:公元:%EC%Ey年;+:1:-0001/12/31:-*:公元前:%EC%Ey年
zh_TW.BIG5@zhuyin 4 +:2:1913/01/01:+*:中華民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:中華民國:%EC元年;+:1:1911/12/31:-*:民國:%EC%Ey年;+:1:1/1/1:1911/12/31:西元:%EC%Ey年;+:1:-1/12/31:-*:西元前:%EC%Ey年
zh_TW.EUC@zhuyin 4 +:2:1913/01/01:+*:中華民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:中華民國:%EC元年;+:1:1911/12/31:-*:民國:%EC%Ey年;+:1:1/1/1:1911/12/31:西元:%EC%Ey年;+:1:-1/12/31:-*:西元前:%EC%Ey年
zh_TW.UTF-8@zhuyin 4 +:2:1913/01/01:+*:中華民國:%EC%Ey年;+:1:1912/01/01:1912/12/31:中華民國:%EC元年;+:1:1911/12/31:-*:民國:%EC%Ey年;+:1:1/1/1:1911/12/31:西元:%EC%Ey年;+:1:-1/12/31:-*:西元前:%EC%Ey年

Unfortunately, we have fewer Japanese ERAs than Linux (which causes the newly added test to fail), but I created #127327 to handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

2 participants