Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set .net compilation output stream to utf8 #73865

Conversation

akien-mga
Copy link
Member

@akien-mga akien-mga commented Feb 24, 2023

Set the .net compilation output stream to utf8 to prevent garbled characters in other languages.

Squashed version of #73861 by @dream-young-soul.

Set the .net compilation output stream to utf8 to prevent garbled characters
in other languages.
@RedworkDE
Copy link
Member

RedworkDE commented Feb 24, 2023

(Presumably) Fixes #71326 and #64276 (tho that is milestone 3.6)

Upstream fix: dotnet/sdk#29755, but I am not sure how this interacts with this / how or if the encoding of stdin/stdout is propagated on windows; also that fix is just in the .NET 8 previews and a backport candidate for 7.0.300.

Didn't / can't test this, on my system other languages work just fine with master (at least it looks ok and google correctly translates the output).

cc: @dream-young-soul

@akien-mga akien-mga added cherrypick:3.x Considered for cherry-picking into a future 3.x release bug and removed enhancement labels Feb 24, 2023
@akien-mga
Copy link
Member Author

CC @bruvzg @timothyqiu

Copy link
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually using UTF-8 for I/O is a good idea, but Windows encoding handling is a huge mess, so I can't tell if it is for everyone.

Also, I'm not sure if it will always work with this part of code:

// Try to convert from default ANSI code page to Unicode.
LocalVector<wchar_t> wchars;
int total_wchars = MultiByteToWideChar(CP_ACP, 0, p_bytes, p_size, nullptr, 0);
if (total_wchars > 0) {
wchars.resize(total_wchars);
if (MultiByteToWideChar(CP_ACP, 0, p_bytes, p_size, wchars.ptr(), total_wchars) == 0) {
wchars.clear();
}
}

@timothyqiu
Copy link
Member

Also, I'm not sure if it will always work with this part of code:

These use cases are in the C# land, using System.Diagnostics.Process to launch and interact with a new process. So OS::execute() is not relevant.

@@ -55,7 +56,8 @@ public static string FindDotNetExe()
process.StartInfo = new ProcessStartInfo(dotNetExe, "--list-sdks")
{
UseShellExecute = false,
RedirectStandardOutput = true
RedirectStandardOutput = true,
StandardOutputEncoding = Encoding.UTF8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem necessary here since we later set DOTNET_CLI_UI_LANGUAGE to en-US for this command.

@RedworkDE
Copy link
Member

After way too much googling, diving into framework sources and some experiments:

  • I don't think this can break things on non-windows platforms because there the default encoding already is UTF8 (tho without BOM)
  • After the upstream fix this is required on Windows 10+ to ensure that we read the same encoding as dotnet
  • Without that this PR can break the encoding in different ways compared to before:
    • if the system uses 65001 (utf8) as the default code page everything will continue working correctly with all languages
    • if the system used another code page, that couldn't display all required characters, things will now be broken in a different way
    • if the system used another code page, that could display all required chars, but used the chars 128-255 to do so, things will now be broken but worked before
    • if godot used a non-65001 cp and dotnet used 65001, (not sure how this can happen) this PR fixes the problems
  • To correctly fix all issues before the upstream patch we would need to force the dotnet process to use the utf8 cp in all cases
    • but it is not possible to change the code page of a different process without hooking into it and executing code there OR sharing a console with that process, but that would mean changing the code page of the godot process, which will probably code other issues somewhere.
    • we could probably create a intermediate process that creates a new console changes it to utf8 and then runs dotnet
    • it looks like creating the dotnet process with the DETACHED_PROCESS flag, which look to cause to process to use cp 0, which on .net core also maps to utf8, BUT you can't pass it when using the Process.Start api.

TL/DR: It's complicated, but it looks like it goes into the right direction, i'll have a look how much of a pain passing DETACHED_PROCESS actually is.

@RedworkDE
Copy link
Member

RedworkDE commented Feb 25, 2023

TL/DR: It's complicated, but it looks like it goes into the right direction, i'll have a look how much of a pain passing DETACHED_PROCESS actually is.

As expected an enormous pain since you need to re-implement IO redirection, so it's probably not worth it.

  1. This PRs should also set the stderr encoding:
diff --git a/modules/mono/editor/GodotTools/GodotTools/Build/BuildSystem.cs b/modules/mono/editor/GodotTools/GodotTools/Build/BuildSystem.cs
index d550c36b82..c9b686413c 100644
--- a/modules/mono/editor/GodotTools/GodotTools/Build/BuildSystem.cs
+++ b/modules/mono/editor/GodotTools/GodotTools/Build/BuildSystem.cs
@@ -38,6 +38,8 @@ namespace GodotTools.Build
             startInfo.RedirectStandardError = true;
             startInfo.UseShellExecute = false;
             startInfo.CreateNoWindow = true;
+            startInfo.StandardOutputEncoding = Encoding.UTF8;
+            startInfo.StandardErrorEncoding = Encoding.UTF8;
             startInfo.EnvironmentVariables["DOTNET_CLI_UI_LANGUAGE"]
                 = ((string)editorSettings.GetSetting("interface/editor/editor_language")).Replace('_', '-');

@@ -102,6 +104,8 @@ namespace GodotTools.Build
             startInfo.RedirectStandardOutput = true;
             startInfo.RedirectStandardError = true;
             startInfo.UseShellExecute = false;
+            startInfo.StandardOutputEncoding = Encoding.UTF8;
+            startInfo.StandardErrorEncoding = Encoding.UTF8;
             startInfo.EnvironmentVariables["DOTNET_CLI_UI_LANGUAGE"]
                 = ((string)editorSettings.GetSetting("interface/editor/editor_language")).Replace('_', '-');

diff --git a/modules/mono/editor/GodotTools/GodotTools/Build/DotNetFinder.cs b/modules/mono/editor/GodotTools/GodotTools/Build/DotNetFinder.cs
index b437c7e742..27b36d9ad1 100644
--- a/modules/mono/editor/GodotTools/GodotTools/Build/DotNetFinder.cs
+++ b/modules/mono/editor/GodotTools/GodotTools/Build/DotNetFinder.cs
@@ -4,6 +4,7 @@ using System.Diagnostics;
 using System.Diagnostics.CodeAnalysis;
 using System.IO;
 using System.Runtime.InteropServices;
+using System.Text;
 using JetBrains.Annotations;
 using OS = GodotTools.Utils.OS;

@@ -55,7 +56,9 @@ namespace GodotTools.Build
             process.StartInfo = new ProcessStartInfo(dotNetExe, "--list-sdks")
             {
                 UseShellExecute = false,
-                RedirectStandardOutput = true
+                RedirectStandardOutput = true,
+                StandardOutputEncoding = Encoding.UTF8,
             };

             process.StartInfo.EnvironmentVariables["DOTNET_CLI_UI_LANGUAGE"] = "en-US";

Edit: can't be in the last one, because stderr is not redirected.

  1. I Found a VM that uses CP 437 as the default, so I could actually test this and it turns out all that research and I am missing something, because in practice this PR just works somehow, for all combinations of SDK 7.0.102, 8.0.100-preview; German and Chinese; and console version and not.
    So, idk, if this just work then or if there are some edge cases where there are issues / regressions with this.

@akien-mga
Copy link
Member Author

I'm not really familiar with this to update it properly, I just PR'ed it to help @dream-young-soul with Git issues. I'd suggest you take this over in a new PR @RedworkDE if you're interested, as you did a lot of great research.

@akien-mga
Copy link
Member Author

Superseded by #74065.

@akien-mga akien-mga closed this Feb 27, 2023
@akien-mga akien-mga deleted the dream-young-soul/dotnet-process-startinfo-utf8 branch February 27, 2023 17:22
@akien-mga akien-mga added archived and removed cherrypick:3.x Considered for cherry-picking into a future 3.x release cherrypick:4.0 labels Feb 27, 2023
@aaronfranke aaronfranke removed this from the 4.1 milestone Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MSBuild output garbled Wrong Chinese encoding for MSBuild errors in the editor
8 participants