Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-46659: Update the test on the mbcs codec alias #31168

Merged
merged 2 commits into from
Feb 6, 2022
Merged

bpo-46659: Update the test on the mbcs codec alias #31168

merged 2 commits into from
Feb 6, 2022

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Feb 6, 2022

Move the test on the "mbcs" codec alias from test_site to
test_codecs. Moreover, the test now uses
locale.getpreferredencoding(False) rather than
locale.getdefaultlocale() to get the ANSI code page.

https://bugs.python.org/issue46659

encodings registers the _alias_mbcs() codec search function before
the search_function() codec search function. Previously, the
_alias_mbcs() was never used.

Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code
page, not a fake ANSI code page number.

Remove the test_site.test_aliasing_mbcs() test: the alias is now
implemented in the encodings module, no longer in the site module.
@vstinner vstinner merged commit 04dd60e into python:main Feb 6, 2022
@vstinner vstinner deleted the mbcs_alias branch February 6, 2022 20:50
Comment on lines +3198 to +3200
# The encodings module create a "mbcs" alias to the ANSI code page
codec = codecs.lookup(encoding)
self.assertEqual(codec.name, "mbcs")
Copy link
Contributor

@eryksun eryksun Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never true before. With 1252 as my ANSI code page, I checked codecs.lookup('cp1252') in 2.7, 3.4, 3.5, 3.6, 3.9, and 3.10, and none of them return the "mbcs" encoding. It's not equivalent, and not supposed to be. The implementation of "cp1252" should be cross-platform, regardless of whether we're on a Windows system with 1252 as the ANSI code page, as opposed to a Windows system with some other ANSI code page, or a Linux or macOS system.

The differences are that "mbcs" maps every byte, whereas our code-page encodings do not map undefined bytes, and the "replace" handler of "mbcs" uses a best-fit mapping (e.g. "α" -> "a") when encoding text, instead of mapping all undefined characters to "?".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue is worse than what I expected, I created https://bugs.python.org/issue46668 to discuss it.

Comment on lines +3194 to +3196
# On Windows, the encoding name must be the ANSI code page
encoding = locale.getpreferredencoding(False)
self.assertTrue(encoding.startswith('cp'), encoding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if PYTHONUTF8 is set in the environment, because it overrides getpreferredencoding(False) and _get_locale_encoding().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news tests Tests in the Lib/test dir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants