Skip to content

Conversation

@compnerd
Copy link
Member

This adjusts the Windows console to switch the codepage to UTF-8. This
is important as the default codepage (CP437) does not allow for UTF-8
output, but expects ASCII. However, strings in Swift are assumed to be
UTF-8, which means that there is now a conversion mismatch.

Because the console mode persists beyond the duration of the application
as it is state local to the console and not the C runtime, we should
restore the state of the console before termination. We do this by
registering a termination handler via atexit. This means that an
abnormal termination (e.g. via fatalError) will irrevocably alter the
state of the console (interestingly enough, chcp will still report the
original console codepage even though the console will internally be set
to UTF-8).

Fixes: SR-13807

Replace this paragraph with a description of your changes and rationale. Provide links to external references/discussions if appropriate.

Resolves SR-NNNN.

This adjusts the Windows console to switch the codepage to UTF-8.  This
is important as the default codepage (CP437) does not allow for UTF-8
output, but expects ASCII.  However, strings in Swift are assumed to be
UTF-8, which means that there is now a conversion mismatch.

Because the console mode persists beyond the duration of the application
as it is state local to the console and not the C runtime, we should
restore the state of the console before termination.  We do this by
registering a termination handler via `atexit`.  This means that an
abnormal termination (e.g. via `fatalError`) will irrevocably alter the
state of the console (interestingly enough, `chcp` will still report the
original console codepage even though the console will internally be set
to UTF-8).

Fixes: SR-13807
@compnerd
Copy link
Member Author

@swift-ci please test

@compnerd
Copy link
Member Author

CC: @jckarter @mikeash

@compnerd
Copy link
Member Author

Pre (CMD):
image

Post (CMD):
image

Post (Windows Terminal):
image

@compnerd compnerd requested review from jckarter and mikeash December 22, 2021 00:15
@svanimpe
Copy link

@compnerd Do you know where that additional space comes in front of the emoji comes from?

@compnerd
Copy link
Member Author

I suspect the font. It appears in the editor as well, though there is no explicit space there. Most likely it is adding padding for the combination.

@compnerd
Copy link
Member Author

compnerd commented Dec 23, 2021

S:\tmp\SR-13807>.\main | "C:\Program Files (x86)\Vim\vim82\xxd.exe"
00000000: f09f a4a6 e280 8de2 9982 efb8 8f0d 0a    ...............

There is no space there that is being emitted. That is a rendering artifact. It may also be a bug in the rendering for Windows terminal - it might treat the combining as a glyph and offset a cell when rendering.

@stevapple
Copy link
Contributor

stevapple commented Jan 1, 2022

I bet there should be an input (SetConsoleCP) counterpart?

@xwu
Copy link
Collaborator

xwu commented Jan 2, 2022

Bad news on that front:

Unfortunately UTF-8 input still doesn’t work, and setting the input code page doesn’t do anything despite reporting success:

SetConsoleCP(CP_UTF8);  // doesn't work

If you care about reading interactive Unicode input, you’re stuck bypassing the C runtime since it’s still broken.

https://nullprogram.com/blog/2021/12/30/

@compnerd
Copy link
Member Author

compnerd commented Jan 3, 2022

Right, I don't think that the input code page would be helpful which is why I didn't add that in, you would want ReadFile any way, at which point, you likely know what you are doing.

Edit: we might be able to do something by using ReadConsoleInput and SetConsoleCP, but I'm not sure how to handle the threading there as we do some gymnastics to deal with the locking in readLine. However, that would be an invasive enough change to warrant a separate PR (and I'm not convinced that it is as important as this).

@compnerd compnerd merged commit f60d633 into swiftlang:main Jan 4, 2022
@compnerd compnerd deleted the utf-8 branch January 4, 2022 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants