Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid encoding when running a specific executable on Windows #196

Closed
mtaku3 opened this issue Feb 25, 2023 · 4 comments
Closed

Invalid encoding when running a specific executable on Windows #196

mtaku3 opened this issue Feb 25, 2023 · 4 comments
Labels

Comments

@mtaku3
Copy link

mtaku3 commented Feb 25, 2023

Version

ver 3.6.0

Details

Some executables varies its output encoding based on the console's default encoding. CliWrap offers a feature to change an encoding of the standard output stream reader. But it never changes the console's default encoding, so there's possibility that the characters being garbled.
.NET's Process class can change the console's default encoding at ProcessStartInfo.StandardOutputEncoding.
Reproduction code tries to echo a character ϧ (U+03E7).
CliWrapImpl will read the output in console's default encoding, so that the character will be garbled.
DotNetImpl will read the output in the UTF-8 encoding because it's specified at ProcessStartInfo.StandardOutputEncoding, so that the character will not be garbled.
.NET Console Application's default encoding is the same as the console's default encoding so that the character may still be garbled. (Console.OutputEncoding must be the same as chcp on cmd)
But you can find that ProcessStartInfo.StandardOutputEncoding can change the console's default encoding on the process and it prevents the characters being consumed in the wrong encoding.

Steps to reproduce

Reproduction code is here.
Just clone, build and run it.
One thing to notice is that console's default encoding varies based on the system locale according to a Microsoft's arcticle. So the behavior may vary as well. Please let me know if you couldn't reproduce this.

@mtaku3 mtaku3 added the bug label Feb 25, 2023
@Tyrrrz
Copy link
Owner

Tyrrrz commented Feb 25, 2023

Hey @mtaku3.

I was able to reproduce the discrepancy with your help, thank you. I'm now trying to figure out whether this is a bug and how to address it.

Furthermore, I tried to replicate your scenario in CliWrap tests (which are running against a .NET executable instead of a batch file) and it didn't work. It seems that ProcessStartInfo.StandardOutputEncoding does not have any effect on certain type of programs – or at least .NET console applications.

I also tried to dig through the documentation to see whether this is an edge case or an OS-specific behavior but I was not able to find any official information regarding this scenario. The docs you linked (https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.processstartinfo.standardoutputencoding?view=net-7.0) don't provide a lot of useful information beyond this remark:

Setting this property does not guarantee that the process will use the specified encoding. The application should be tested to determine which encodings the process supports.

Question: does your original use case also involve a batch file? If not, what kind of program is it?

@Tyrrrz
Copy link
Owner

Tyrrrz commented Feb 25, 2023

As an immediate workaround, you can use this extension method:

public static Command WithChcpWrapper(this Command command, Encoding encoding)
{
    return Cli.Wrap("cmd")
        .WithArguments(a => a
            .Add("/c")
            .Add(
                new ArgumentsBuilder()
                    .Add("chcp")
                    .Add(encoding.CodePage)
                    .Add(">nul")
                    .Add("&&")
                    .Add(command.TargetFilePath)
                    .Add(command.Arguments, false)
                    .Build(),
                false
            )
        )
        .WithWorkingDirectory(command.WorkingDirPath)
        .WithEnvironmentVariables(command.EnvironmentVariables)
        .WithCredentials(command.Credentials)
        .WithStandardInputPipe(command.StandardInputPipe)
        .WithStandardOutputPipe(command.StandardOutputPipe)
        .WithStandardErrorPipe(command.StandardErrorPipe)
        .WithValidation(command.Validation);
}

It wraps your existing command in cmd and sets the encoding within that session. You can use it like so:

private static async Task CliWrapImpl()
{
    await Cli.Wrap("echo.bat")
        .WithStandardOutputPipe(PipeTarget.ToDelegate(Console.WriteLine, Encoding.UTF8))
        .WithChcpWrapper(Encoding.UTF8)
        .ExecuteBufferedAsync();
}

Note that I removed ./ from the path because cmd trips up on paths starting with . unless they're quoted, and CliWrap doesn't quote . because it's not considered a special character. You may want to tweak it a bit.

@Tyrrrz
Copy link
Owner

Tyrrrz commented Feb 25, 2023

Actually, after further testing, it seems that even making this change is enough to get it working. Can you test it out @mtaku3?

private static async Task CliWrapImpl()
{
    await Cli.Wrap("./echo.bat")
-       .WithStandardOutputPipe(PipeTarget.ToDelegate(Console.WriteLine))
+       .WithStandardOutputPipe(PipeTarget.ToDelegate(Console.WriteLine, Encoding.UTF8))
        .ExecuteAsync();
}

@mtaku3
Copy link
Author

mtaku3 commented Feb 26, 2023

I was in a wrong way. You are right!
I was doing like this

Cli.Wrap("./example.exe")
	.WithStandardOutputPipe(PipeTarget.ToDelegate(Console.WriteLine))
	.Observe(Encoding.UTF8, Encoding.UTF8, forciblyCloseCTS.Token, gracefullyCloseCTS.Token)
	.Subscribe();

But I didn't notice that encoding parameters on Observe() has an effect only on the observable, which will be created by Observe() and it doesn't have an effect on Pipe which is merged by WithStandardOutputPipe().
Providing Encoding.UTF8 in WithStandardOutputPie solved my issue.

For your reference, I was trying to run a process made of Go and is using log package for logging. log package doesn't have a feature to set the encoding and seems to depend on the console.

Also, I tested with another batch file, which outputs the current code page by chcp and run it on CliWrapImpl and DotNetImpl. Both of it outputs the same encoding and looks like both of it doesn't have an effect on the console's encoding. So to change the console's encoding, I have to use something like your WithChcpWrapper.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants