Fix display of the "low ASCII" glyphs in PC code pages #1964

j4james · 2019-07-14T13:53:58Z

Summary of the Pull Request

In the legacy console, it used to be possible to write out characters from the C0 range of a PC code page (e.g. CP437), and get the actual glyphs defined for those code points (at least those that weren't processed as control codes). In the v2 console this stopped working so you'd get an FFFD replacement glyph (�) for those characters instead. This PR fixes the issue so the correct glyphs are displayed again.

PR Checklist

Closes CP437 "low ASCII" characters not working in the v2 console #166
CLA signed. If not, go over here and sign the CLA
Tests added/passed
Requires documentation to be updated
I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: CP437 "low ASCII" characters not working in the v2 console #166

Detailed Description of the Pull Request / Additional comments

There was already code in place to achieve this in the WriteCharsLegacy method. It used the GetStringTypeW method to determine the character type of the value being output, and if it was a C1_CNTRL character it performed the appropriate mapping. The problem was that the test of the character type flag was done as a direct comparision, when it should have been a bit test, so the condition was never met.

With this condition fixed, the code also needed to be reordered slightly to handle the null character. That had a special-case mapping to space, which was previously performed after the control test, but since a null character now successfully matches C1_CNTRL, it no longer falls through to that special case. To address that, I've had to move the null check above the control test.

Validation Steps Performed

I've tested this manually, by trying to output all the characters in the affected range (ASCII values 0 to 31, and 127, excluding the actual control codes 8,9,10 and 13). In all cases they now match the output that the legacy console produced.

Note that this only applies to PC code pages that have glyphs defined for the C0 range, so it won't work with the UTF-8 code page, but that was to be expected - the legacy console behaved the same way.

Also, note that this only works when the ENABLE_PROCESSED_OUTPUT console mode is set. That seems wrong to me (I'd expect the glyphs to work in both cases), but that's the way the legacy console behaved as well, so if that's a bug it's a separate issue.

I haven't added any unit tests, because I expect the behaviour of some of these characters to change over time (as support is added for more control codes), which could then cause the tests to fail. But if that's not a concern, I could probably add something to the ScreenBufferTests (perhaps with a comment warning that the tests might be expected to fail in the future).

… bit test rather than an equality comparison, otherwise it won't match anything.

…'t be reached now that the type check is working correctly.

j4james · 2019-07-14T17:42:39Z

Having done some more testing, I've found another issue related to this change. I think it's an existing bug in the conpty code rather than a problem with the PR itself (see the issue referenced above), but this "fix" clearly makes things worse, so it might be best to put it on hold for now.

zadjii-msft · 2019-09-20T15:07:55Z

So my only big concern with this is how this behaves relative to the v1 console. If you have a test application that writes an ESC to the buffer, and then uses ReadConsoleOutput to get the character that's in the buffer, what happens? Is it still an ESC 0x1b in both v1 and v2? Does the behavior change at all? If not, then I'm 100% okay with this.

zadjii-msft

Actually now that I've read through the rest of that original issue, this seems good to me. Thanks!

j4james · 2019-09-20T20:01:10Z

Just to be clear, if you write an ESC to the stdout, ReadConsoleOutput will return that as U+2190 after this PR is applied (assuming one of the PC code pages). But that is what the v1 console returns as well. It's the current implementation - returning U+001B - that is different from the legacy behaviour.

DHowett-MSFT · 2019-09-20T20:16:03Z

Excellent.

At some point in the future (in a non-compatible mode) we may want to return $C0 + 0x2400 to get the Control Pictures

zadjii-msft · 2019-09-20T20:17:43Z

It's the current implementation - returning U+001B - that is different from the legacy behaviour.

DESIRE TO SHIP INTENSIFIES

In the legacy console, it used to be possible to write out characters from the C0 range of a PC code page (e.g. CP437), and get the actual glyphs defined for those code points (at least those that weren't processed as control codes). In the v2 console this stopped working so you'd get an FFFD replacement glyph (�) for those characters instead. This PR fixes the issue so the correct glyphs are displayed again. There was already code in place to achieve this in the `WriteCharsLegacy` method. It used the `GetStringTypeW` method to determine the character type of the value being output, and if it was a `C1_CNTRL` character it performed the appropriate mapping. The problem was that the test of the character type flag was done as a direct comparision, when it should have been a bit test, so the condition was never met. With this condition fixed, the code also needed to be reordered slightly to handle the null character. That had a special-case mapping to space, which was previously performed after the control test, but since a null character now successfully matches `C1_CNTRL`, it no longer falls through to that special case. To address that, I've had to move the null check above the control test. I've tested this manually, by trying to output all the characters in the affected range (ASCII values 0 to 31, and 127, excluding the actual control codes 8,9,10 and 13). In all cases they now match the output that the legacy console produced. Note that this only applies to PC code pages that have glyphs defined for the C0 range, so it won't work with the UTF-8 code page, but that was to be expected - the legacy console behaved the same way. Also, note that this only works when the `ENABLE_PROCESSED_OUTPUT` console mode is set. That seems wrong to me (I'd expect the glyphs to work in both cases), but that's the way the legacy console behaved as well, so if that's a bug it's a separate issue. I haven't added any unit tests, because I expect the behaviour of some of these characters to change over time (as support is added for more control codes), which could then cause the tests to fail. But if that's not a concern, I could probably add something to the ScreenBufferTests (perhaps with a comment warning that the tests might be expected to fail in the future). Closes #166. (cherry picked from commit 9102c5d)

ghost · 2019-09-24T16:26:21Z

🎉Windows Terminal Preview v0.5.2661.0 has been released which incorporates this pull request.:tada:

Handy links:

DHowett-MSFT · 2019-10-17T22:24:47Z

This went out for conhost in insider build 19002! Thanks 😄

j4james added 2 commits July 14, 2019 13:27

Fix the character type check in WriteCharsLegacy, which needs to be a…

0a9f47e

… bit test rather than an equality comparison, otherwise it won't match anything.

Move the null check before the character type check, otherwise it won…

aaf9f79

…'t be reached now that the type check is working correctly.

j4james mentioned this pull request Jul 14, 2019

Escape sequences behave strangely over conpty when VirtualTerminalLevel is set #1965

Closed

zadjii-msft requested review from DHowett-MSFT, zadjii-msft and miniksa September 20, 2019 15:08

zadjii-msft added Product-Conhost For issues in the Console codebase Issue-Bug It either shouldn't be doing this or needs an investigation. labels Sep 20, 2019

zadjii-msft approved these changes Sep 20, 2019

View reviewed changes

DHowett-MSFT approved these changes Sep 20, 2019

View reviewed changes

DHowett-MSFT merged commit 9102c5d into microsoft:master Sep 20, 2019

j4james deleted the fix-control-glyphs branch September 22, 2019 09:36

ghost mentioned this pull request Sep 24, 2019

CP437 "low ASCII" characters not working in the v2 console #166

Closed

j4james mentioned this pull request Jan 31, 2020

C0 characters in screen buffer are incorrectly interpreted as controls by conpty #4363

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix display of the "low ASCII" glyphs in PC code pages #1964

Fix display of the "low ASCII" glyphs in PC code pages #1964

j4james commented Jul 14, 2019

j4james commented Jul 14, 2019

zadjii-msft commented Sep 20, 2019

zadjii-msft left a comment

j4james commented Sep 20, 2019

DHowett-MSFT commented Sep 20, 2019

zadjii-msft commented Sep 20, 2019

ghost commented Sep 24, 2019

DHowett-MSFT commented Oct 17, 2019

Fix display of the "low ASCII" glyphs in PC code pages #1964

Fix display of the "low ASCII" glyphs in PC code pages #1964

Conversation

j4james commented Jul 14, 2019

Summary of the Pull Request

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

j4james commented Jul 14, 2019

zadjii-msft commented Sep 20, 2019

zadjii-msft left a comment

Choose a reason for hiding this comment

j4james commented Sep 20, 2019

DHowett-MSFT commented Sep 20, 2019

zadjii-msft commented Sep 20, 2019

ghost commented Sep 24, 2019

DHowett-MSFT commented Oct 17, 2019