-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce breaking changes to ReadConsoleOutput #13321
Conversation
714dbf9
to
f1eb06a
Compare
d74ced1
to
444d44f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, from what I remember about the discussion in #13339, the tl;dr was "this never worked right, so changing what it does in the broken case shouldn't make anyone more broke", then
:ship_it:
// If .AsciiChar and .UnicodeChar have the same offset (since they're a union), | ||
// we can just write the latter with a byte-sized value to set the former | ||
// _and_ simultaneously clear the upper byte of .UnicodeChar to 0. Nice! | ||
static_assert(offsetof(CHAR_INFO, Char.AsciiChar) == offsetof(CHAR_INFO, Char.UnicodeChar)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might just be me being obtuse, but it seems weird to me that this static_assert
is inside both loops, and not like, at the top of the method, or just at the top of the file or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other assertion is in the DBCS test file unfortunately. I felt like keeping those assertions close to the code is better since there are alternative ways to write this code (for instance assigning UnicodeChar
and AsciiChar
independently).
@msftbot do not merge unless @DHowett approves |
Hello @lhecker! Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:
If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you". |
🎉 Handy links: |
#13626 contains a small "regression" compared to #13321: It now began to store trailers in the buffer wherever possible to allow a region of the buffer to be backed up and restored via Read/WriteConsoleOutput. But we're unfortunately still ill-equipped to handle anything but UCS-2 via WriteConsoleOutput, so it's best to again ignore trailers just like in #13321. ## Validation Steps Performed * Added unit test ✅
#13626 contains a small "regression" compared to #13321: It now began to store trailers in the buffer wherever possible to allow a region of the buffer to be backed up and restored via Read/WriteConsoleOutput. But we're unfortunately still ill-equipped to handle anything but UCS-2 via WriteConsoleOutput, so it's best to again ignore trailers just like in #13321. ## Validation Steps Performed * Added unit test ✅ (cherry picked from commit 9dcdcac) Service-Card-Id: 88719297 Service-Version: 1.17
#8000 will change the way we store text from a strict grid/matrix where
one UTF16 character or surrogate pair always equals 1 column with the
possibility of joining exactly 2 to a wide character pair, to a dynamic
buffer where 1 or more characters can form 1 or more columns in any
arbitrary combination. Our long term goal is to properly support both
complex grapheme clusters like Emojis and complex ligatures that a wider
than 2 columns. This change requires us to break our API as
ReadConsoleOutputA/W
assumes the existence of exactly this grid/matrixstorage. Since we store wide characters like "い" as a single codepoint
that is simply marked as being 2 columns wide in the future, we cannot
reconstruct trailing DBCS characters that were written to the buffer
like we used to. On the other hand this new behavior allows us to
implement better Unicode support and most likely significantly improve
our performance.
Minor breaking changes
ReadConsoleOutputA
will now always zero the high byte in(CHAR_INFO).Char.UnicodeChar
. Only the.AsciiChar
can be usedthen. This prevents users from storing "additional" data in the
terminal buffer.
ReadConsoleOutputA
will now zero the.AsciiChar
if it fails toconvert the Unicode character into an appropriate DBCS.
being a wide character. In these cases
WriteConsoleOutputA
willnow return
0x00
instead of0x44
(the lower half of い's codepoint
0x3044
).Major breaking changes
ReadConsoleOutputW
will now repeat the leading Unicode charactertwice and ignore the trailing one.
0x3044 0xabcd
withWriteConsoleOutputW
used to yield the same0x3044 0xabcd
if readback with
ReadConsoleOutputW
. This worked because conhosteffectively ignored the trailing codepoint, allowing one to
"smuggle" data. In the future this trailing character will be
discarded and produce
0x3044 0x3044
instead.WriteConsoleOutputA
can be done withcode page 932 (Shift-JIS) and the DBCS
0x82 0xa2
. If read backwith
ReadConsoleOutputW
this would previously yield the twoUnicode characters
0x3044 0xffff
. After this commit it'll yield0x3044 0x3044
.Alternative approaches
It's possible to "tag"/"mark" written data as originating from
WriteConsoleOutputA/W
so that it can be reconstructed accurately lateron. However this lead to implementation complexities that we're actively
trying to avoid in the new buffer implementation. Effectively
everything that touches the buffer's text would have to handle these
marks and either write or clear them. Given the most likely small amount
of users who depend on the current quirky behavior, it'd be an
unwarranted maintenance and performance burden and prevent Windows
Terminal to ever truly migrate to full Unicode support.
Validation Steps Performed