-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Console UTF-8 input is misbehaving on Windows #43295
Comments
Tagging subscribers to this area: @eiriktsarpalis, @jeffhandley |
Can also reproduce with Chinese characters, but they are replaced with |
Can also reproduce with .NET Framework 4.7.2 |
FWIW, I don't see this on my machine. I get this:
with:
I'd guess we have different code pages set globally by default somewhere, as Console uses the win32 GetConsoleCP/GetConsoleOutputCP functions on Windows to determine what encoding to use. |
This should be the key point. Check "use UTF-8 for non-Unicode programs". |
1 similar comment
This should be the key point. Check "use UTF-8 for non-Unicode programs". |
@huoyaoyuan I mentioned in the bug description that I have tried both with that setting off and on; it doesn't seem to make any difference. I think whatever you set with @stephentoub for context, my region settings are: |
@alexrp could you please send the content of the registry key Also, after changing the Region Settings option to enable UTF-8, did you reboot the machine after that? Another thing, what font you are using in the console too. |
For me, it's 65001
I had turned on this for 1 year
Just Consolas. It should be unrelated. |
Yes.
I don't think it matters since the text gets garbled at input, not output, but: CMD and PowerShell use Lucida Console, Windows Terminal and mintty use Consolas. |
Seems I can repro this but only when I change code page of console to 65001 but note the output for 437 is also not exactly the same as input (ø => o)
|
Output for 437 is expected. Input redirection works as expected too:
So only direct read from console is broken. |
This might be interesting Codeusing System;
using System.IO;
using System.Linq;
using System.Text;
namespace Test
{
static class Program
{
static void PrintHex(Span<byte> bytes)
{
foreach (byte x in bytes)
{
Console.Write($"{x:X2} ");
}
Console.WriteLine();
}
static void Main()
{
string problematic = @"abcæøådef";
Console.WriteLine(Console.InputEncoding);
Console.WriteLine(Console.OutputEncoding);
Console.WriteLine();
Console.WriteLine($"original: {problematic}");
Console.Write(" Text : ");
Console.WriteLine(" Result : {0}", Console.ReadLine());
Stream stdin = Console.OpenStandardInput();
Console.Write(" Text : ");
byte[] bytes = new byte[100];
int readBytes = stdin.Read(bytes);
Span<byte> input = new Span<byte>(bytes).Slice(0, readBytes);
Console.Write(" input: ");
PrintHex(input);
Console.Write("in. conv: ");
Console.Write(Console.InputEncoding.GetString(input));
Console.Write("original: ");
PrintHex(Console.InputEncoding.GetBytes(problematic));
}
}
}
|
I suspect the problem might be us using ReadFile ( I see people report issues with that: I think we should always go to the other code path and possibly do some conversion there (or hard-code the input encoding to whatever ReadConsole is using on Windows) |
@danmosemsft would this meet the bar for 5.0/servicing? This has impact on every customer using console apps with non ASCII characters with .NET. I know this repros at minimum in 3.1 and likely lower as well. |
@krwq I recommend we get the fix into 6.0.0. After that, if we receive enough reports of users being blocked by this bug, we'd consider down-level servicing. We would need validation from users who have encountered this that the behavior is indeed fixed with the 6.0.0 builds. |
Can reproduce on .NET 6 |
This also reproduces in .NET 7 p5:
Console.InputEncoding = Encoding.UTF8;
Console.WriteLine($"Current Encoding: {Console.InputEncoding}");
Console.WriteLine($"Input: {Console.ReadLine()}"); The following code, when '가나다' is inputted, displays nothing - the string returned from Console.ReadLine appears to be three NUL characters instead of the actual characters that was put in. It seems to work fine with |
[+] Added NameRegen command: Now you can rename archive files without editing all entries manually [+] Dictionary file support: To bypass dotnet/runtime#43295
Looks like it's fixed by microsoft/terminal#14745. |
@o-sdn-o thank you for letting us know! In such case, I am closing the issue. |
Description
Example run:
The non-ASCII characters are basically being replaced with
NUL
s for some reason.This happens in all terminals I've tried (CMD, PowerShell, Windows Terminal, mintty). I checked
chcp
in allconhost
-based terminals, and it reported 65001 (UTF-8) everywhere. I've also enabled global UTF-8 in Windows region settings just for good measure (enabling/disabling it appears to make no difference).What is fascinating here is that this only seems to happen in .NET Core processes. No other programs in any of the terminals I've tried have issues processing non-ASCII characters. For example, things like this work in all of them:
What is even more fascinating is that if you P/Invoke
ReadFile
to read from standard input in the .NET Core program instead of usingSystem.Console
, you get the same issue: The read is successful but non-ASCII characters are just replaced withNUL
s.So the question is: Why are .NET Core processes special? What does .NET Core do that seemingly makes
ReadFile
misbehave?Configuration
The text was updated successfully, but these errors were encountered: