-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.ConsoleKeyInfo can not handle Unicode surrogate pair and Emoji Sequences #27828
Comments
...this can't be unique to emoji, presumably? Shouldn't some upper-range characters and combining-characters suffer the same fate? Also, I'm assuming this would be triggered by copy/paste input, so I don't think it's just soft keyboards we have to worry about. |
This is a problem with surrogate pairs in general. It's not clear how we would address this issue without introducing some form of a breaking change. |
To me, it seems like the logical course of action would be to introduce a new API based around Now that I think about it, it would probably also make sense to have a key-reading API that returns a string, since many languages such as Guarani use characters which one might input with a single keypress, and yet are composed of multiple code points. Having However, both seem to be an issue for PSReadLine, which leads to a longstanding bug in PowerShell. |
Is there a way to get surrogate pair from Edit: |
The design of key code and key char is dated to the IBM PC keyboard controller. This is not the *nix way of letting the console device and program itself escape these keys presses and pass them as a stream. I think the proper way is to add a string or ReadOnlySpan typed property in |
I'm assuming the scenario is "I want to be able to get an entire grapheme cluster via If you return An alternative solution might be to return Now, is this acceptable? Maybe your application doesn't interpret these as individual to-be-displayed characters and stitches them all together after the fact anyway. But if you're stitching them together, presumably you could stitch together char-by-char using the current API, and no string-by-string / Rune-by-Rune API is needed. This is one of those weird things where the requested change need to be laid out very specifically. Things that might seem obvious to one person might not seem obvious to another, and it could have a ripple effect which upends the proposed solution. |
The problem of In short, I think we should at least let I think it's OK to get The problem is that we do need a Use I have submitted my API suggestion at #51085 which I think makes sense for us. |
@DHowett I think the problem is rooted at https://docs.microsoft.com/en-us/windows/console/key-event-record-str which uses a WCHAR to store translated Unicode character which for now the input can be a surrogate pair or sequence of Unicode codepoint. |
Other applications seem capable of handling surrogate pairs in the WCHAR-typed field of two KEY_EVENT_RECORDs just fine. |
@DHowett that it's true as we can handling any surrogate pairs of two or more KEY_EVENT_RECORDs, but how to deal when the same unicode code returns different string length with other font types? |
The font cannot change the length of a string. If you're talking about the column count (perceived space taken up by the string of printed to a console), that's just off topic for this issue :). That's also one of the harder issues in terminal emulation. |
If you really want your mind to melt, spec out what behavior your app will have when the user hits BACKSPACE immediately after entering a complex multi-scalar emoji. :) |
Let me repeat the issue again: The problem of This makes rubbish out and impossible to work around in user code. |
@WenceyWang you may need to deal with |
Windows have a soft keyboard that can type Emoji directly.
Emoji is a set of chars that cannot be stored in a single
char
whileSystem.ConsoleKeyInfo
uses achar
to store the content of the pressed key.In my test,
System.Console.ReadKey
will return aSystem.ConsoleKeyInfo
whichKeyChar
is the first char of the emoji (a sequence of surrogate pair, might 10+) and I have no way to get the other chars.The problem of ReadKey is it may return the first half of a surrogate pair and the next ReadKey call will return the next keypress, not the remaining part of the surrogate pair.
This makes rubbish out and impossible to work around in user code.
This problem also applies to these keyboards for script language.
The text was updated successfully, but these errors were encountered: