bpo-46659: Update the test on the mbcs codec alias #31168

vstinner · 2022-02-06T18:41:45Z

Move the test on the "mbcs" codec alias from test_site to
test_codecs. Moreover, the test now uses
locale.getpreferredencoding(False) rather than
locale.getdefaultlocale() to get the ANSI code page.

https://bugs.python.org/issue46659

encodings registers the _alias_mbcs() codec search function before the search_function() codec search function. Previously, the _alias_mbcs() was never used. Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code page, not a fake ANSI code page number. Remove the test_site.test_aliasing_mbcs() test: the alias is now implemented in the encodings module, no longer in the site module.

eryksun · 2022-02-06T21:55:36Z

Lib/test/test_codecs.py

+        # The encodings module create a "mbcs" alias to the ANSI code page
+        codec = codecs.lookup(encoding)
+        self.assertEqual(codec.name, "mbcs")


This was never true before. With 1252 as my ANSI code page, I checked codecs.lookup('cp1252') in 2.7, 3.4, 3.5, 3.6, 3.9, and 3.10, and none of them return the "mbcs" encoding. It's not equivalent, and not supposed to be. The implementation of "cp1252" should be cross-platform, regardless of whether we're on a Windows system with 1252 as the ANSI code page, as opposed to a Windows system with some other ANSI code page, or a Linux or macOS system.

The differences are that "mbcs" maps every byte, whereas our code-page encodings do not map undefined bytes, and the "replace" handler of "mbcs" uses a best-fit mapping (e.g. "α" -> "a") when encoding text, instead of mapping all undefined characters to "?".

This issue is worse than what I expected, I created https://bugs.python.org/issue46668 to discuss it.

eryksun · 2022-02-06T21:56:14Z

Lib/test/test_codecs.py

+        # On Windows, the encoding name must be the ANSI code page
+        encoding = locale.getpreferredencoding(False)
+        self.assertTrue(encoding.startswith('cp'), encoding)


This will fail if PYTHONUTF8 is set in the environment, because it overrides getpreferredencoding(False) and _get_locale_encoding().

vstinner added the skip news label Feb 6, 2022

the-knights-who-say-ni added the CLA signed label Feb 6, 2022

bedevere-bot added awaiting core review tests Tests in the Lib/test dir labels Feb 6, 2022

vstinner added 2 commits February 6, 2022 21:23

Update test_codecs.test_basics()

6a4c886

vstinner merged commit 04dd60e into python:main Feb 6, 2022

bedevere-bot removed the awaiting core review label Feb 6, 2022

vstinner deleted the mbcs_alias branch February 6, 2022 20:50

eryksun reviewed Feb 6, 2022

View reviewed changes

vstinner mentioned this pull request Dec 11, 2022

Deprecate locale.getdefaultlocale() function #90817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-46659: Update the test on the mbcs codec alias #31168

bpo-46659: Update the test on the mbcs codec alias #31168

vstinner commented Feb 6, 2022 •

edited by bedevere-bot

Loading

eryksun Feb 6, 2022 •

edited

Loading

vstinner Feb 6, 2022

eryksun Feb 6, 2022

bpo-46659: Update the test on the mbcs codec alias #31168

bpo-46659: Update the test on the mbcs codec alias #31168

Conversation

vstinner commented Feb 6, 2022 • edited by bedevere-bot Loading

eryksun Feb 6, 2022 • edited Loading

Choose a reason for hiding this comment

vstinner Feb 6, 2022

Choose a reason for hiding this comment

eryksun Feb 6, 2022

Choose a reason for hiding this comment

vstinner commented Feb 6, 2022 •

edited by bedevere-bot

Loading

eryksun Feb 6, 2022 •

edited

Loading