-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a GetEncodings method to System.Text.EncodingProvider to support enumerating available character encodings #25819
Comments
This cannot be abstract method. Adding abstract method to existing type is a breaking change.
I do not think we need a new overload for |
@mklement0 thanks for the proposal. For the open question, System.Text.Encodings.GetEncodings() should report all encodings including the provider encodings. I am not sure why this will be a compatibility issue? I cannot think in any breaking scenario here as we are going to return a superset from what we used to return. could you please elaborate more on that? |
To be clear, my preference too is for If there's no concern about backward compatibility (see below) and no need to support discovering just the default encodings while additional ones happen to be registered, then I'm happy to
I'm personally not worried about backward compatibility in this case, but conceivably there is code out there that relies on So: Do wee need to worry about retaining a predictable, invariant way to discover the set of default encodings?
Thanks for pointing that out; I need guidance as to what to propose instead. |
please remove the open question then.
make the method virtual instead of abstract. |
@tarekgh: Thanks - done. I've also added details re return behavior (enumeration order); let me know if that makes sense. |
System.Text.Encodings.GetEncodings documentation doesn't promise for any sort order. From where you got that? |
You're right - I didn't check the docs, I just inferred it from the de facto behavior. Do you want me to say that the order is unspecified? What about when forming the union between default and registered encodings? |
I prefer not mentioning anything about the order. we just promise to return the whole set supported by the core and the providers. |
@tarekgh: Done (I've added a note that no particular enumeration order is guaranteed). |
Thanks @mklement0 I have added the following statement too to the proposal integration section. let me know if you have any concern with that.
|
CC @krwq |
Thanks, @tarekgh - good addition (I've streamlined the wording a bit). |
There may be another scenario if the user wants to get only the default list: public virtual System.Text.EncodingInfo[] GetEncodings (bool IncludeDefaultsOnly = false); |
@iSazonov: This was discussed above, but both @jkotas and @tarekgh think it's not necessary. (On a side note: My guess is that separate method overloads are preferred to optional parameters; may be worth adding a clarification to the coding style document.) |
|
@mklement0 do you want to help in the implementation? |
@mklement0 Ping me too if you want. |
I gave it a shot, but hit roadblocks (see below - some may require renewed discussion and sign-off). Given my inexperience, I suggest someone more experienced take this on - @iSazonov, please feel free to take over. Implementing this API requires CoreCLR (CoreLib) changes too, and I don't know how to coordinate that (even just adding a new virtual
On the CoreFX side:
|
@mklement0 thanks for trying. we'll try to get into this at some point. |
I think you have raised a good point too. we may need to revisit discussing EncodingInfo class to enable constructing it outside coreclr and provide the needed functionality. we'll try to think more about this one. public sealed partial class EncodingInfo
{
internal EncodingInfo() { }
public int CodePage { get { throw null; } }
public string DisplayName { get { throw null; } }
public string Name { get { throw null; } }
public override bool Equals(object value) { throw null; }
public System.Text.Encoding GetEncoding() { throw null; }
public override int GetHashCode() { throw null; }
} |
@mklement0 I haven't an expirience to contribute in CoreCLR/CoreFX 😕
I believe it will be two PRs - in CoreCLR repo and in CoreFX repo. After the first one will be merged it will be automatically replicated in Core FX repo. After that the second PR in CoreFX can be continue. |
@tarekgh Have you any news about EncodingInfo? |
No update yet because it is kind of low priority for now. But I suggest adding the following public constructor to public EncodingInfo(int codePage, string name, string displayName) { } |
namespace System.Text
{
public partial class EncodingProvider
{
public virtual IEnumerable<EncodingInfo> GetEncodings();
}
public partial class EncodingInfo
{
public EncodingInfo(EncodingProvider provider, int codePage, string name, string displayName);
}
} |
This API proposal arose out of #25804.
This proposal was approved before but we discovered the original proposal was returning EncodingInfo object while we didn't expose any public constructor for this type.
The delta change in this proposal is exposing a public constructor for EncodingInfo to allow creating such objects
Rationale and Use Cases
As of this writing:
System.Text.Encoding.GetEncodings()
only ever enumerates the default encodings available.System.Text.CodePagesEncodingProvider.Instance
lacks a method for enumerating additional encodings registered viaSystem.Text.Encoding.RegisterProvider(EncodingProvider)
, which - via a call toSystem.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance)
- makes the Windows code-page character encodings available to .NET Core.However, it is desirable to have the ability to enumerate all available encodings:
Even in the abstract it seems like an awkward omission not to be able to reflect on the available encodings; more concretely, consider the following use cases, both related to PowerShell Core:
PowerShell Core commands such as
Get-Content
for reading text files support an-Encoding
parameter that directly acceptsSystem.Text.Encoding
instances, so users expect to be able to discover all available encodings and/or be assisted in selecting a specific encoding with tab completion:Convert-TextFile
command will allow conversion between all available character encodings, so their discovery / ease of selection is important there too.Proposed API
Add a
GetEncodings()
method to abstract classSystem.Text.EncodingProvider
:And
System.Text.CodePagesEncodingProvider
would then implement that method to return the encodings it provides.Specifically, a call to
System.Text.CodePagesEncodingProvider.Instance.GetEncodings()
would then return the array ofSystem.Text.EncodingInfo
instances representing that provider's encodings.As with
System.Text.Encodings.GetEncodings()
, no particular enumeration order is guaranteed.Integration into
System.Text.Encodings.GetEncodings()
Once
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance)
has been called,System.Text.Encodings.GetEncodings()
will enumerate the additional encodings too, using the newly introduced method on the provider if registered.In other words:
System.Text.Encodings.GetEncodings()
will return whatever encodings are currently available - whether just the default set by default or the union of the default set and the additional encodings after provider registration.As with the existing enumeration, the encodings (
EncodingInfo
instances) are returned in no guaranteed order.If there is more than one registered provider that supports a given encoding, the returned list will contain only one entry for it.
Updates (most recent ones first)
GetEncodings
methodvirtual
rather thanabstract
.The text was updated successfully, but these errors were encountered: