-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-46659: Update the test on the mbcs codec alias #31168
Conversation
encodings registers the _alias_mbcs() codec search function before the search_function() codec search function. Previously, the _alias_mbcs() was never used. Fix the test_codecs.test_mbcs_alias() test: use the current ANSI code page, not a fake ANSI code page number. Remove the test_site.test_aliasing_mbcs() test: the alias is now implemented in the encodings module, no longer in the site module.
# The encodings module create a "mbcs" alias to the ANSI code page | ||
codec = codecs.lookup(encoding) | ||
self.assertEqual(codec.name, "mbcs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was never true before. With 1252 as my ANSI code page, I checked codecs.lookup('cp1252')
in 2.7, 3.4, 3.5, 3.6, 3.9, and 3.10, and none of them return the "mbcs" encoding. It's not equivalent, and not supposed to be. The implementation of "cp1252" should be cross-platform, regardless of whether we're on a Windows system with 1252 as the ANSI code page, as opposed to a Windows system with some other ANSI code page, or a Linux or macOS system.
The differences are that "mbcs" maps every byte, whereas our code-page encodings do not map undefined bytes, and the "replace" handler of "mbcs" uses a best-fit mapping (e.g. "α" -> "a") when encoding text, instead of mapping all undefined characters to "?".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is worse than what I expected, I created https://bugs.python.org/issue46668 to discuss it.
# On Windows, the encoding name must be the ANSI code page | ||
encoding = locale.getpreferredencoding(False) | ||
self.assertTrue(encoding.startswith('cp'), encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fail if PYTHONUTF8
is set in the environment, because it overrides getpreferredencoding(False)
and _get_locale_encoding()
.
Move the test on the "mbcs" codec alias from test_site to
test_codecs. Moreover, the test now uses
locale.getpreferredencoding(False) rather than
locale.getdefaultlocale() to get the ANSI code page.
https://bugs.python.org/issue46659