-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-101180: Fix a bug where iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds #111695
Conversation
…ecs read out of bounds iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds when encoding Unicode combining character sequence. This bug ocurs the following error: $ python3 -c "print('\u304b\u309a'.encode('iso2022_jp_2004'))" Traceback (most recent call last): File "<string>", line 1, in <module> UnicodeEncodeError: 'iso2022_jp_2004' codec can't encode character '\u309a' in position 1: illegal multibyte sequence This commit fixes the out-of-bounds read.
iso2022_jp_3 and iso2022_jp_2004 are upward compatible with iso2022_jp. In addition to testing iso2022_jp, we will test the following characters added in iso2022_jp_3 and iso2022_jp_2004. JIS X 0213 Unicode ---------------- --------------------------------------------- Plane 1 \x2E\x23 U+3402 Basic Multilingual Plane Plane 1 \x2E\x22 U+2000B Supplementary Ideographic Plane Plane 1 \x24\x77 U+304B U+309A Combining Character Suqence Plane 2 \x21\x22 U+4E02 Basic Multilingual Plane Plane 2 \x7E\x76 U+2A6B2 Supplementary Ideographic Plane The difference between iso2022_jp_3 and iso2022_jp_2004 is the difference between JIS X 0213:2000 and JIS X 0213:2004. Tests the following a character added from JIS X 0213:2000 to JIS X 0213:2004. JIS X 0213:2004 Unicode ---------------- ------- Plane 1 \x2E\x21 U+4FF1 Escape sequence to designate JIS X 0213 character set to G0: character set ESC sequence ----------------------- --------------------------- JIS X 0213:2000 Plane 1 ESC 2/4 2/8 4/15 ESC $ ( O JIS X 0213:2000 Plane 2 ESC 2/4 2/8 5/0 ESC $ ( P JIS X 0213:2004 Plane 1 ESC 2/4 2/8 5/1 ESC $ ( Q JIS X 0213:2004 Plane 2 ESC 2/4 2/8 5/0 ESC $ ( P
I will take a look at this PR by next week :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too
Sorry, @moriyama and @corona10, I could not cleanly backport this to
|
…ecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
GH-111769 is a backport of this pull request to the 3.12 branch. |
GH-111771 is a backport of this pull request to the 3.11 branch. |
…004 codecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
GH-111780 is a backport of this pull request to the 3.9 branch. |
…04 codecs read out of bounds (pythongh-111695) (cherry picked from commit c8faa35) Co-authored-by: Masayuki Moriyama <masayuki.moriyama@miraclelinux.com>
GH-111781 is a backport of this pull request to the 3.8 branch. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thank you for merging the fix. |
…ecs read out of bounds (pythongh-111695)
…ecs read out of bounds (pythongh-111695)
…ecs read out of bounds (pythongh-111695)
iso2022_jp_3 and iso2022_jp_2004 codecs read out of bounds when encoding Unicode combining character sequence.
This bug ocurs the following error:
This PR fixes the out-of-bounds read.