Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/Compilers/CSharp/Test/CommandLine/CommandLineTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -5791,6 +5791,10 @@ public void Codepage()
parsedArgs.Errors.Verify();
Assert.Equal("Unicode (UTF-8)", parsedArgs.Encoding.EncodingName);

parsedArgs = DefaultParse(new[] { "/CodePage:1252", "a.cs" }, WorkingDirectory);
parsedArgs.Errors.Verify();
Copy link
Member

@jjonescz jjonescz Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fail without the fix? #Resolved

Copy link
Contributor Author

@AlekseyTs AlekseyTs Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fail without the fix?

Yes, when run on CoreCLR the test fails without the fix with the error from the issue.

Assert.Equal("Western European (Windows)", parsedArgs.Encoding.EncodingName);

// error
parsedArgs = DefaultParse(new[] { "/codepage:0", "a.cs" }, WorkingDirectory);
parsedArgs.Errors.Verify(Diagnostic(ErrorCode.FTL_BadCodepage).WithArguments("0"));
Expand Down
40 changes: 40 additions & 0 deletions src/Compilers/Core/Portable/CommandLine/CommandLineParser.cs
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ public abstract class CommandLineParser
internal readonly bool IsScriptCommandLineParser;
private static readonly char[] s_searchPatternTrimChars = new char[] { '\t', '\n', '\v', '\f', '\r', ' ', '\x0085', '\x00a0' };
internal const string ErrorLogOptionFormat = "<file>[,version={1|1.0|2|2.1}]";
private static bool s_registeredEncodingProvider = CodePagesEncodingProvider.Instance == null;
Copy link
Member

@jjonescz jjonescz Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can CodePagesEncodingProvider.Instance be null here? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not control behavior of the API. In theory result of the property could be null. We have similar check in other places in our codebase. If it cannot be null, there is no harm in checking it anyway.


internal CommandLineParser(CommonMessageProvider messageProvider, bool isScriptCommandLineParser)
{
Expand Down Expand Up @@ -1220,10 +1221,49 @@ internal IEnumerable<CommandLineSourceFile> ParseRecurseArgument(string arg, str
&& long.TryParse(arg, NumberStyles.None, CultureInfo.InvariantCulture, out long codepage)
&& (codepage > 0))
{
try_again:
try
{
return Encoding.GetEncoding((int)codepage);
}
catch (NotSupportedException) when (!s_registeredEncodingProvider)
{
// From documentation:
// - 'GetEncoding' throws NotSupportedException when codepage is not supported by the underlying platform.
// - 'EncodingProvider.Instance' gets an encoding provider for code pages supported
// in the desktop .NET Framework but not by the current underlying platform.
// - 'Encoding.RegisterProvider' makes character encodings available on a platform that does not otherwise support them.
// * Once the encoding provider is registered, the encodings that it supports can be retrieved by calling any
// Encoding.GetEncoding overload.
// * Registering an encoding provider by using the 'RegisterProvider' method also affects the behavior of
// GetEncoding(Int32) when passed an argument of 0.
// * If multiple providers are registered, GetEncoding(Int32) attempts to retrieve the encoding from the most recently
// registered provider first.
// * If the 'RegisterProvider' method is called to register multiple providers that handle the same encoding,
// the last registered provider is the used for all encoding and decoding operations. Any previously registered providers are ignored.
// * If the same encoding provider is used in multiple calls to the 'RegisterProvider' method,
// only the first method call registers the provider. Subsequent calls are ignored.
//
// Given all that:
// - We don't call 'Encoding.RegisterProvider' unconditionally to avoid changing environment
// that is already configured to support the requested codepage. We call it only when we encounter
// a 'NotSupportedException'.
// - We also do not attempt to call 'Encoding.RegisterProvider' more than once.
try
{
// Ignore any exceptions from an attempt to register the provider.
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Copy link
Member

@agocke agocke Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call RegisterProvider unconditionally before calling GetEncoding? The reason I ask is that my memory was that providing a specific encoding to the compiler was relatively rare. I think we usually don’t get a specific encoding and just assume utf8. And I think utf8 is both the most common encoding (inc ascii) and the recommended encoding.

So if we’re already in an edge case, could we assume that if someone needs a specific code page it might be a non-registered one? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From documentation of the APIs, it is my understanding that calling Encoding.RegisterProvider might overwrite previous setup. So, if the environment was properly setup to support certain encoding, this call might change that setup. I.e. the encoding will be supported but by a different provider. Therefore, I felt that it might make sense to not mess with providers, unless we run into a failure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider capturing this in a comment, it probably won't be obvious to future readers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def agree. The logic/flow here is complex for me (nested try/catches with gotos). I think this def warrants some explanation on what's going on.

}
catch
{
}

s_registeredEncodingProvider = true;

// Try to get the encoding again after attempting to register the provider.
// Since we set `s_registeredEncodingProvider` to true, we won't get here again.
goto try_again;
}
catch (Exception)
{
return null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1863,6 +1863,10 @@ End Module").Path
parsedArgs.Errors.Verify()
Assert.Equal("Unicode (UTF-8)", parsedArgs.Encoding.EncodingName)

parsedArgs = DefaultParse({"/CodePage:1252", "a.vb"}, _baseDirectory)
parsedArgs.Errors.Verify()
Assert.Equal("Western European (Windows)", parsedArgs.Encoding.EncodingName)

' errors
parsedArgs = DefaultParse({"/codepage:0", "a.vb"}, _baseDirectory)
parsedArgs.Errors.Verify(Diagnostic(ERRID.ERR_BadCodepage).WithArguments("0"))
Expand Down
Loading