Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate locale.getdefaultlocale() function #90817

Closed
vstinner opened this issue Feb 6, 2022 · 24 comments
Closed

Deprecate locale.getdefaultlocale() function #90817

vstinner opened this issue Feb 6, 2022 · 24 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir

Comments

@vstinner
Copy link
Member

vstinner commented Feb 6, 2022

BPO 46659
Nosy @malemburg, @vstinner, @serhiy-storchaka, @eryksun
PRs
  • bpo-46659: calendar uses locale.getlocale() #31166
  • bpo-46659: test.support avoids locale.getdefaultlocale() #31167
  • bpo-46659: Update the test on the mbcs codec alias #31168
  • bpo-46659: Deprecate locale.getdefaultlocale() #31206
  • bpo-46659: Enhance LocaleTextCalendar for C locale #31214
  • bpo-46659: Fix the MBCS codec alias on Windows #31218
  • Files
  • cal_locale.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2022-02-24.13:41:35.585>
    created_at = <Date 2022-02-06.17:33:14.432>
    labels = ['library', '3.11']
    title = 'Deprecate locale.getdefaultlocale() function'
    updated_at = <Date 2022-02-24.14:53:20.800>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2022-02-24.14:53:20.800>
    actor = 'lemburg'
    assignee = 'none'
    closed = True
    closed_date = <Date 2022-02-24.13:41:35.585>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2022-02-06.17:33:14.432>
    creator = 'vstinner'
    dependencies = []
    files = ['50606']
    hgrepos = []
    issue_num = 46659
    keywords = ['patch']
    message_count = 19.0
    messages = ['412647', '412652', '412664', '412666', '412667', '412668', '412687', '412800', '412819', '412825', '412826', '412827', '412829', '412842', '413744', '413745', '413907', '413910', '413915']
    nosy_count = 4.0
    nosy_names = ['lemburg', 'vstinner', 'serhiy.storchaka', 'eryksun']
    pr_nums = ['31166', '31167', '31168', '31206', '31214', '31218']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue46659'
    versions = ['Python 3.11']

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2022

    The locale.getdefaultlocale() function only relies on environment variables. At Python startup, Python calls setlocale() is set the LC_CTYPE locale to the user preferred encoding.

    Since Python 3.7, if the LC_CTYPE locale is "C" or "POSIX", PEP-538 sets the LC_CTYPE locale to a UTF-8 variant if available, and PEP-540 ignores the locale and forces the usage of the UTF-8 encoding. The *effective* encoding used by Python is inconsistent with environment variables.

    Moreover, if setlocale() is called to set the LC_CTYPE locale to a locale different than the user locale, again, environment variables are inconsistent with the effective locale.

    In these cases, locale.getdefaultlocale() result is not the expected locale and it can lead to mojibake and other issues.

    For these reasons, I propose to deprecate locale.getdefaultlocale(): setlocale(), getpreferredencoding() and getlocale() should be used instead.

    For the background on these issues, see recent issue:

    @vstinner vstinner added 3.11 only security fixes stdlib Python modules in the Lib dir labels Feb 6, 2022
    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2022

    cal_locale.py: Test calendar.LocaleTextCalendar() default locale, manual test for #75349.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2022

    New changeset 04dd60e by Victor Stinner in branch 'main':
    bpo-46659: Update the test on the mbcs codec alias (GH-31168)
    04dd60e

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2022

    New changeset 06b8f16 by Victor Stinner in branch 'main':
    bpo-46659: test.support avoids locale.getdefaultlocale() (GH-31167)
    06b8f16

    @malemburg
    Copy link
    Member

    For these reasons, I propose to deprecate locale.getdefaultlocale(): setlocale(), getpreferredencoding() and getlocale() should be used instead.

    Please see the discussion on https://bugs.python.org/issue43552: locale.getpreferredencoding() needs to be deprecated as well. Instead we should have a single locale.getencoding() as outlined there... perhaps in a separate ticket ?! Thanks.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2022

    Please see the discussion on https://bugs.python.org/issue43552: locale.getpreferredencoding() needs to be deprecated as well. Instead we should have a single locale.getencoding() as outlined there... perhaps in a separate ticket ?! Thanks.

    Yeah, I read this issue. But these things are too complicated :-) I prefer to move step by step.

    Once locale.getencoding() (or a similar function) is added, we can update the deprecation message.

    I hope to be able to deprecate getdefaultlocale() and to add such new function in Python 3.11.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2022

    New changeset 04dd60e by Victor Stinner in branch 'main':
    bpo-46659: Update the test on the mbcs codec alias (GH-31168)

    This change is not correct, I created bpo-46668 to fix it.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 7, 2022

    New changeset 7a0486e by Victor Stinner in branch 'main':
    bpo-46659: calendar uses locale.getlocale() (GH-31166)
    7a0486e

    @serhiy-storchaka
    Copy link
    Member

    getdefaultlocale() falls back to LANG and LANGUAGE. It allows also to specify a list of looked up environment variables. How could this use case be covered with getlocale()?

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 8, 2022

    getdefaultlocale() falls back to LANG and LANGUAGE.

    _Py_SetLocaleFromEnv(LC_CTYPE) (e.g. setlocale(LC_CTYPE, "")) gets called at startup, except for the isolated configuration [1].

    I think calendar.Locale*Calendar should try the LC_CTYPE locale if LC_TIME is "C", i.e. (None, None). Otherwise, it's introducing new default behavior. For example, with LC_ALL set to "ru_RU.utf8":

    3.8:

        >>> locale.getlocale(locale.LC_TIME)
        (None, None)
        >>> locale.getlocale(locale.LC_CTYPE)
        ('ru_RU', 'UTF-8')
        >>> cal = calendar.LocaleTextCalendar()
        >>> cal.formatweekday(0, 15)
        '  Понедельник  '

    3.11.0a5+:

        >>> locale.getlocale(locale.LC_TIME)
        (None, None)
        >>> locale.getlocale(locale.LC_CTYPE)
        ('ru_RU', 'UTF-8')
        >>> cal = calendar.LocaleTextCalendar()
        >>> cal.formatweekday(0, 15)
        '     Monday    '
        >>> locale.setlocale(locale.LC_TIME, '')
        'ru_RU.utf8'
        >>> cal = calendar.LocaleTextCalendar()
        >>> cal.formatweekday(0, 15)
        '  Понедельник  '

    [1] https://docs.python.org/3/c-api/init_config.html?#isolated-configuration

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 8, 2022

    Serhiy: "getdefaultlocale() falls back to LANG and LANGUAGE. It allows also to specify a list of looked up environment variables. How could this use case be covered with getlocale()?"

    What's your use case to use env vars rather than the current LC_CTYPE locale?

    My concern is that when setlocale() is called, the current LC_CTYPE locale is inconsistent and you can get mojibake and others issues.

    See for example:
    https://bugs.python.org/issue43552#msg389069

    Marc-Andre Lemburg wants to deprecate it:
    https://bugs.python.org/issue43552#msg389076

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 8, 2022

    I think calendar.Locale*Calendar should try the LC_CTYPE locale if LC_TIME is "C", i.e. (None, None). Otherwise, it's introducing new default behavior. For example, with LC_ALL set to "ru_RU.utf8": (...)

    Oh. Serhiy asked me to use LC_TIME rather than LC_CTYPE.

    See also my example in the PR:
    #31166 (comment)

    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 8, 2022

    Oh. Serhiy asked me to use LC_TIME rather than LC_CTYPE.

    Since Locale*Calendar is documented as not being thread safe, __init__() could get the real default via setlocale(LC_TIME, "") when locale=None and the current LC_TIME is "C". Restore it back to "C" after getting the default. That should usually match the behavior from previous versions that called getdefaultlocale(). In cases where it differs, it's fixing a bug because the default LC_TIME is the correct default.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 8, 2022

    Eryk: I created #75397 which uses the user preferred locale if the current LC_TIME locale is "C" or "POSIX".

    Moreover, it no longer gets the current locale when the class is created. If locale=locale is passed, just use the current LC_TIME (or the user preferred is the locale is "C" or "POSIX").

    @vstinner
    Copy link
    Member Author

    New changeset ccbe804 by Victor Stinner in branch 'main':
    bpo-46659: Fix the MBCS codec alias on Windows (GH-31218)
    ccbe804

    @vstinner
    Copy link
    Member Author

    New changeset b899126 by Victor Stinner in branch 'main':
    bpo-46659: Deprecate locale.getdefaultlocale() (GH-31206)
    b899126

    @vstinner
    Copy link
    Member Author

    New changeset 4fccf91 by Victor Stinner in branch 'main':
    bpo-46659: Enhance LocaleTextCalendar for C locale (GH-31214)
    4fccf91

    @vstinner
    Copy link
    Member Author

    locale.getdefaultlocale() is now deprecated.

    calendar now uses locale.setlocale() instead of locale.getdefaultlocale().

    The ANSI code page alias to MBCS now has better tests and better comments.

    Thanks Eryk Sun for your very useful feedback!

    @malemburg
    Copy link
    Member

    Thanks, Victor.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    vstinner added a commit that referenced this issue May 25, 2022
    The function was already deprecated in Python 3.11 since it calls
    locale.getdefaultlocale() which was deprecated in Python 3.11.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 25, 2022
    …3196)
    
    The function was already deprecated in Python 3.11 since it calls
    locale.getdefaultlocale() which was deprecated in Python 3.11.
    (cherry picked from commit bf58cd0)
    
    Co-authored-by: Victor Stinner <vstinner@python.org>
    miss-islington added a commit that referenced this issue May 25, 2022
    The function was already deprecated in Python 3.11 since it calls
    locale.getdefaultlocale() which was deprecated in Python 3.11.
    (cherry picked from commit bf58cd0)
    
    Co-authored-by: Victor Stinner <vstinner@python.org>
    @danny0838
    Copy link

    danny0838 commented Dec 11, 2022

    locale.getlocale does not return a correct RFC 1766 language code on Windows, which makes it incompatible with locale.getdefaultlocale. (#82986)

    Our project relies on locale.getdefaultlocale to get the default system language code, which won't work using locale.getlocale due to the above bug. This breaks downward compatibility.

    May I request NOT REMOVING locale.getdefaultlocale until there is a good solution for the bug of locale.getlocale?

    @shineworld
    Copy link

    Return of locale.getlocale() is not compatibile with typical locale string (Python 3.11.7 on Windows):

    locale.getdefaultlocale()
    ('it_IT', 'cp1252')

    locale.getlocale()
    ('Italian_Italy', '1252')

    'it_IT' permits to fastly use gettext ./locale/<it_IT>/... translations mo files.

    @vstinner
    Copy link
    Member Author

    Return of locale.getlocale() is not compatibile with typical locale string

    That's why the deprecation suggests using locale.setlocale(). Hum, the API is surprising, but locale.setlocale() should be used to get the current locale: locale.setlocale(locale.LC_CTYPE, None) gets the current LC_CTYPE locale for example.

    @vstinner
    Copy link
    Member Author

    Maybe the deprecation message should be clarified. I dislike locale.getlocale(), it can return invalid locale which is not accepted by setlocale(). I wanted to deprecate it.

    @malemburg
    Copy link
    Member

    malemburg commented Apr 16, 2024

    The point of having getlocale() is to be able to normalize the returned values to make them compatible to setlocale().

    The C lib API of setlocale() to actually fetch the current locale values is completely counterintuitive, which is why I had added getlocale() instead. On top of this, setlocale() does not always return values which you can feed back into it.

    @shinewook: Could you please check what locale._setlocale(locale.LC_TYPE) returns on your Windows system ? It's possible that we'll have to add more aliases to the table to fix this. Thanks.

    BTW: Perhaps better to open a separate ticket for that problem...

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants