Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix display of the "low ASCII" glyphs in PC code pages #1964

Merged
merged 2 commits into from
Sep 20, 2019

Conversation

j4james
Copy link
Collaborator

@j4james j4james commented Jul 14, 2019

Summary of the Pull Request

In the legacy console, it used to be possible to write out characters from the C0 range of a PC code page (e.g. CP437), and get the actual glyphs defined for those code points (at least those that weren't processed as control codes). In the v2 console this stopped working so you'd get an FFFD replacement glyph (�) for those characters instead. This PR fixes the issue so the correct glyphs are displayed again.

PR Checklist

Detailed Description of the Pull Request / Additional comments

There was already code in place to achieve this in the WriteCharsLegacy method. It used the GetStringTypeW method to determine the character type of the value being output, and if it was a C1_CNTRL character it performed the appropriate mapping. The problem was that the test of the character type flag was done as a direct comparision, when it should have been a bit test, so the condition was never met.

With this condition fixed, the code also needed to be reordered slightly to handle the null character. That had a special-case mapping to space, which was previously performed after the control test, but since a null character now successfully matches C1_CNTRL, it no longer falls through to that special case. To address that, I've had to move the null check above the control test.

Validation Steps Performed

I've tested this manually, by trying to output all the characters in the affected range (ASCII values 0 to 31, and 127, excluding the actual control codes 8,9,10 and 13). In all cases they now match the output that the legacy console produced.

Note that this only applies to PC code pages that have glyphs defined for the C0 range, so it won't work with the UTF-8 code page, but that was to be expected - the legacy console behaved the same way.

Also, note that this only works when the ENABLE_PROCESSED_OUTPUT console mode is set. That seems wrong to me (I'd expect the glyphs to work in both cases), but that's the way the legacy console behaved as well, so if that's a bug it's a separate issue.

I haven't added any unit tests, because I expect the behaviour of some of these characters to change over time (as support is added for more control codes), which could then cause the tests to fail. But if that's not a concern, I could probably add something to the ScreenBufferTests (perhaps with a comment warning that the tests might be expected to fail in the future).

j4james added 2 commits July 14, 2019 13:27
… bit test rather than an equality comparison, otherwise it won't match anything.
…'t be reached now that the type check is working correctly.
@j4james
Copy link
Collaborator Author

j4james commented Jul 14, 2019

Having done some more testing, I've found another issue related to this change. I think it's an existing bug in the conpty code rather than a problem with the PR itself (see the issue referenced above), but this "fix" clearly makes things worse, so it might be best to put it on hold for now.

@zadjii-msft
Copy link
Member

So my only big concern with this is how this behaves relative to the v1 console. If you have a test application that writes an ESC to the buffer, and then uses ReadConsoleOutput to get the character that's in the buffer, what happens? Is it still an ESC 0x1b in both v1 and v2? Does the behavior change at all? If not, then I'm 100% okay with this.

@zadjii-msft zadjii-msft added Product-Conhost For issues in the Console codebase Issue-Bug It either shouldn't be doing this or needs an investigation. labels Sep 20, 2019
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually now that I've read through the rest of that original issue, this seems good to me. Thanks!

@j4james
Copy link
Collaborator Author

j4james commented Sep 20, 2019

Just to be clear, if you write an ESC to the stdout, ReadConsoleOutput will return that as U+2190 after this PR is applied (assuming one of the PC code pages). But that is what the v1 console returns as well. It's the current implementation - returning U+001B - that is different from the legacy behaviour.

@DHowett-MSFT
Copy link
Contributor

Excellent.

At some point in the future (in a non-compatible mode) we may want to return $C0 + 0x2400 to get the Control Pictures

@zadjii-msft
Copy link
Member

It's the current implementation - returning U+001B - that is different from the legacy behaviour.

DESIRE TO SHIP INTENSIFIES

@DHowett-MSFT DHowett-MSFT merged commit 9102c5d into microsoft:master Sep 20, 2019
@j4james j4james deleted the fix-control-glyphs branch September 22, 2019 09:36
DHowett-MSFT pushed a commit that referenced this pull request Sep 23, 2019
In the legacy console, it used to be possible to write out characters
from the C0 range of a PC code page (e.g. CP437), and get the actual
glyphs defined for those code points (at least those that weren't
processed as control codes). In the v2 console this stopped working so
you'd get an FFFD replacement glyph (�) for those characters instead.
This PR fixes the issue so the correct glyphs are displayed again.

There was already code in place to achieve this in the
`WriteCharsLegacy` method. It used the `GetStringTypeW` method to
determine the character type of the value being output, and if it was a
`C1_CNTRL` character it performed the appropriate mapping. The problem
was that the test of the character type flag was done as a direct
comparision, when it should have been a bit test, so the condition was
never met.

With this condition fixed, the code also needed to be reordered slightly
to handle the null character. That had a special-case mapping to space,
which was previously performed after the control test, but since a null
character now successfully matches `C1_CNTRL`, it no longer falls
through to that special case. To address that, I've had to move the null
check above the control test.

I've tested this manually, by trying to output all the characters in the
affected range (ASCII values 0 to 31, and 127, excluding the actual
control codes 8,9,10 and 13). In all cases they now match the output
that the legacy console produced.

Note that this only applies to PC code pages that have glyphs defined
for the C0 range, so it won't work with the UTF-8 code page, but that
was to be expected - the legacy console behaved the same way.

Also, note that this only works when the `ENABLE_PROCESSED_OUTPUT`
console mode is set. That seems wrong to me (I'd expect the glyphs to
work in both cases), but that's the way the legacy console behaved as
well, so if that's a bug it's a separate issue.

I haven't added any unit tests, because I expect the behaviour of some
of these characters to change over time (as support is added for more
control codes), which could then cause the tests to fail. But if that's
not a concern, I could probably add something to the ScreenBufferTests
(perhaps with a comment warning that the tests might be expected to fail
in the future).

Closes #166.

(cherry picked from commit 9102c5d)
@ghost
Copy link

ghost commented Sep 24, 2019

🎉Windows Terminal Preview v0.5.2661.0 has been released which incorporates this pull request.:tada:

Handy links:

@DHowett-MSFT
Copy link
Contributor

This went out for conhost in insider build 19002! Thanks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CP437 "low ASCII" characters not working in the v2 console
3 participants