-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider changing the default console encoding on Windows to UTF-16 #31466
Comments
This seems reasonable. We'd need to understand it from a compat perspective. Most console apps are expecting -- purely by default, so they may not even realize it -- the user's codepage when reading stdin. Changing the default here might create issues when piping between apps. |
Shouldn't morden applications be totally agnostic to code pages? All strings should only be processed in some variation of Unicode. |
@huoyaoyuan I agree. The issue here is deciding whether it's okay to make this change, which is possibly breaking for a very few specific use cases. I think user I/O over the console should be mostly unaffected, but as was brought up, this could break applications which assume codepages when piping the standard input and output. However, such applications are already incredibly fragile since they rely on the user having the correct codepage set system-wide, so they're likely to break all the time even without this change. In my opinion, this change would fix more than it breaks for the vast majority of users. |
I've seen some other threads complaining that the console is defaulting to UTF-8 instead of codepages for them on Windows, but these are a few years old, so I'm a bit confused now. Was the default behaviour changed to default to codepages instead? |
@scalablecory Would a change like this be a good fit for a major release like .NET 5 or 6? After all, those usually have a list of migration notes, compared to more minor releases. |
In my tests on Windows, .NET Core's console I/O defaults to the user's local codepage. This leads to quite a bit of overhead as .NET's UTF-16 strings have to be converted to the codepage, just for the console to convert them back to UTF-16. On top of this, it means that a lot of characters can't be properly represented, which I discovered by accident when I was unable to enter a euro sign in my program as my locale (932) doesn't have it. There's an easy fix for this, which is to change the console encoding manually, but unless the programmer is doing fuzzing testing, this problem could be hard to discover.
With that in mind, I propose that the default encoding for console I/O on Windows be changed to UTF-16, allowing low-overhead and lossless passing around of strings between the console and .NET programs.
On Mac and Linux, the default already seems to be UTF-8, which a sensible choice on those platforms, and I think Windows should be brought in line with those other two by using a Unicode encoding (UTF-16) for its I/O as well.
The text was updated successfully, but these errors were encountered: