-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeNameParser API #97566
Comments
Tagging subscribers to this area: @dotnet/area-system-reflection-metadata Issue DetailsI am working on the design of new type parser. It's going to include a new type that represents the parsed type name, the type name parser and most likely an option bag for its customization. It may also include an For brevity I am going to call these types: public sealed class TypeName
{
// properties that describe parsed type like its generic arguments or array rank
}
public ref struct TypeNameParser
{
public static TypeName Parse(ReadOnlySpan<char> name, TypeNameParserOptions? options = null);
}
public class TypeNameParserOptions
{
// properties that describe customizable settings like max allowed recursion depth
} Before I submit the proposal, I want to have something that:
I'll try to replace the Roslyn parser too, but I can't promise that (please let me know if this is a must have). The thing I am not sure of is where the mentioned types should belong. For example, currently the public class Type
{
public static Type? GetType(TypeName typeName);
} So those who have parsed the type name and verified it, could load the type without parsing the type name again. This leads me to thinking, that
I can also simply not extend the #if SYSTEM_PRIVATE_CORELIB
internal
#else
public
#endif
struct TypeNameParser But this would lead into a situation where .NET 9 apps would load two type name parsers: one internal from @jkotas @GrabYourPitchforks are there any better solutions?
|
Yes, that's fine. I do not see a problem with it. I do not think that a public |
@jkotas @GrabYourPitchforks I've started porting System.Private.CoreLib to the new parser and have stumbled upon a type name that uses a single square bracket (instead of two) to represent generic type name: runtime/src/libraries/System.Runtime/tests/System.Reflection.Tests/ModuleTests.cs Line 63 in 52e1ad3
I would expect the type name to be represented in following way (this is what - System.Nullable`1[System.Int32]
+ System.Nullable`1[[System.Int32]] Should this syntax be supported by the new parser? If so, only for CoreLib for backward compatibility or also by the new public API? I am asking because it's the only test that is using such syntax (that I could find by running |
Yes, it should be supported. It has been supported by the runtime APIs for last 20+ years. I do not see a reason why we would drop it. The double square brackets are required to avoid ambiguity in fully qualified type names. They are not needed for non-fully qualified type names (type names without assembly name). We do not have a formal spec for the type name grammar. #4416 has an attempt to create one. |
@jkotas Thank you! |
ProgressI was able to port CLR, Mono, NativeAOT and internal tools to use the new parser (#97864). My current plan is the following:
Design: TypeNameI decided to represent type name information with a single public sealed class TypeName : IEquatable<TypeName>
{
internal TypeName() // encapsulation: only parser can create new instances
public System.Reflection.AssemblyName? AssemblyName { get; }
public string AssemblyQualifiedName { get; }
public System.Reflection.Metadata.TypeName? ContainingType { get; } // returns not null for nested types
public bool IsArray { get; } // true for [], [*] and [,,,]
public bool IsConstructedGenericType { get; }
public bool IsElementalType { get; }
public bool IsManagedPointerType { get; }
public bool IsNestedType { get; }
public bool IsSzArrayType { get; }
public bool IsUnmanagedPointerType { get; }
public bool IsVariableBoundArrayType { get; }
public string Name { get; }
public System.Reflection.Metadata.TypeName? UnderlyingType { get; }
public int GetArrayRank();
public System.Reflection.Metadata.TypeName[] GetGenericArguments();
} Questions:
Design: customizationIt seems to me that the most important aspect of design is the customization. Some examples:
BTW we could also introduce an optional flag for CLR to set the default parsing rules (example: set it to I am not sure to what degree we should allow the users for customization. Few ideas I have are listed below. Option bag with virtual validation methodsWe literally expose a possibility to implement custom validation. public class TypeNameParserOptions
{
public TypeNameParserOptions() { }
public bool AllowFullyQualifiedName { get; set; } = true;
public int MaxRecursiveDepth { get; set; } = int.MaxValue;
public bool ThrowOnError { get; set; } = true;
public virtual bool ValidateTypeName(System.ReadOnlySpan<char> candidate); // when it finds invalid character it returns false when ThrowOnError is set to true, throws when it's set to false
public virtual bool ValidateAssemblyName(System.ReadOnlySpan<char> candidate);
public virtual ReadOnlySpan<char> TrimStart(ReadOnlySpan<char> input); // span.TrimStart() by default, span.TrimStart(' ') for strict parser
}
public ref struct TypeNameParser
{
public static TypeName? Parse(ReadOnlySpan<char> typeName, TypeNameParserOptions? options = null);
} It has a perf penalty: every time we need non-default settings we need to allocate a class (or maintain a static cache) and pay the virtual method overhead (this can disappear in the future with new JIT versions). Two dedicated parse methods
IMO public ref struct TypeNameParser
{
public static TypeName? Parse(ReadOnlySpan<char> typeName, bool allowFullyQualifiedName = true, bool throwOnError = true);
public static TypeName? ParseStrict(ReadOnlySpan<char> typeName, bool allowFullyQualifiedName = true, bool throwOnError = true, int maxRecursiveDepth = 10);
} It opens the door for introducing more and more overloads in the future. @jkotas @GrabYourPitchforks please let me know what you think |
Shouldn't the parsing methods be in |
I think parsed AssemblyName is problematic. It would need to have the culture resolved into a CultureInfo that is not hardened against untrusted source. The caller needs to be able to control whether culture gets resolved into CultureInfo.
In general, we have a strong preference for consistent naming and I would expect it to be the case here as well. I do not think we want to be creative with inventing a new names for existing concepts. I think the names should match 1:1 with the established names used by reflection (System.Type) if possible.
I do not see what is unsafe about these. These are very arbitrary policies. Do these policies need to be part of the System.Reflection.Metadata parser? If the caller does not like non-ASCII characters or spaces for whatever reason, they can implement the policy as part of their TypeName to actual type resolver.
Do we also need a cap on max total complexity of the type (ie number of TypeName objects that the parser creates internally)?
Immutable options bag for things like MaxRecursiveDepths makes sense to me. Validation callbacks for names do to not make sense to me. |
Could you please create a code example for how to use this parser to build a hardened runtime type name resolver with allow lists, etc. It will help us to resolve some of the API design questions above. |
I've analyzed some of our first party customers code bases and used the new API to implement safe @jkotas PTAL and let me know what do you think about it. |
Excellent point! I've applied this suggestion and it looks much better. |
I do not understand the comment for AllowFullyQualifiedName or what it is trying to protect against. If somebody creates an type name like that is in the comment, Looks reasonable to me otherwise. |
namespace System.Reflection.Metadata;
public sealed class TypeName : IEquatable<TypeName>
{
internal TypeName() { }
public string? AssemblySimpleName { get; }
public string AssemblyQualifiedName { get; }
public TypeName? DeclaringType { get; }
public string FullName { get; }
public bool IsArray { get; }
public bool IsConstructedGenericType { get; }
public bool IsByRef { get; }
public bool IsNested { get; }
public bool IsSZArray { get; }
public bool IsPointer { get; }
public bool IsVariableBoundArray { get; }
public string Name { get; }
public int Complexity { get; }
public TypeName GetElementType();
public TypeName GetGenericTypeDefinition();
public static TypeName Parse(ReadOnlySpan<char> typeName, TypeNameParseOptions? options = null);
public static bool TryParse(ReadOnlySpan<char> typeName, [NotNullWhenAttribute(true)] out TypeName? result, TypeNameParseOptions? options = null);
public int GetArrayRank();
public Reflection.AssemblyName? GetAssemblyName();
public ReadOnlyMemory<TypeName> GetGenericArguments();
}
public sealed class TypeNameParseOptions
{
public TypeNameParseOptions() { }
public bool AllowFullyQualifiedName { get; set; } = true;
public int MaxTotalComplexity { get; set; } = 10;
public bool StrictValidation { get; set; } = false;
} |
If that is not a problem, I would like to extend the proposal with a method that compares the parsed type name against provided type and can take type forwarding into account. public class TypeName
{
public bool Matches(Type type, bool includeTypeForwards = true);
} cc @terrajobst |
This introduces a dependency on System.Type. I think we have said that this API should not depend on System.Type. |
Also, it is not clear where the type forwards would come from. Would this API trigger assembly loading and potential execution of code from assemblies involved? |
@jkotas this came up in my review of the SafePayloadReader. I think it's common for consumers to want to match an instance of
FWIW, I think getting a
I think the idea is that if the passed in instance of |
We do not mark forwarded types with |
What's the downside of exposing that policy on Seems much more sensible than |
Binary formatter specific functionality hiding under non-binary formatter specific name that does not express what the API actually does. The proper way to check whether a type matches type name is to resolve the type name into a type and see whether it produced the same type instance. It is how the runtime or .NET compilers do this check. The policy implemented by this API has number of nuances: How do you compare versions? How do you compare strong names? There can be a mismatch between where TypeForwardedFromAttribute points to and where that type is actually forwarded to. Different consumers can have different requirements for how to handle these. I think this type of check belongs to binary formatter specific code. If somebody has a clone of binary formatter, they can copy this logic and tweak it any way they want. I expect that it would not be the only piece of binary formatter specific logic copied by binary formatter clones. |
Let me take that back. @jkotas is right, this is not the job of the |
namespace System.Reflection.Metadata
{
public sealed class TypeName : IEquatable<TypeName>
{
internal TypeName() { }
public string? AssemblySimpleName { get; }
public string AssemblyQualifiedName { get; }
public TypeName? DeclaringType { get; }
public string FullName { get; }
public bool IsArray { get; }
public bool IsConstructedGenericType { get; }
public bool IsByRef { get; }
public bool IsNested { get; }
public bool IsSZArray { get; }
public bool IsPointer { get; }
public bool IsVariableBoundArray { get; }
public string Name { get; }
public bool IsSimple { get; }
public TypeName GetElementType();
public TypeName GetGenericTypeDefinition();
public static TypeName Parse(ReadOnlySpan<char> typeName, TypeNameParseOptions? options = null);
public static bool TryParse(ReadOnlySpan<char> typeName, [NotNullWhenAttribute(true)] out TypeName? result, TypeNameParseOptions? options = null);
public int GetArrayRank();
public Reflection.AssemblyName? GetAssemblyName();
public ImmutableArray<TypeName> GetGenericArguments();
public int GetNodeCount();
}
public sealed class TypeNameParseOptions
{
public TypeNameParseOptions() { }
public int MaxNodes { get; set; } = 10;
}
}
namespace System
{
partial class Type
{
public bool IsSimple { get; }
}
} |
Have you considered how this round tripping will work in the face of say function pointers? Consider the following code: var type = typeof(C);
var member = type.GetMethod("M")!;
var parameters = member.GetParameters();
Console.WriteLine(parameters[0].ParameterType.AssemblyQualifiedName);
class C {
public unsafe static void M(delegate*<int, void> f) {}
} The type of
That means there is some amount of lossy-ness when round tripping between a type in metadata -> |
I think defaulting |
A generic type counts as the sum of the parameter node counts, plus 1.
It's basically the number of words that appear in the type name, plus the number of
The final number is TBD, they have to do some research to come up with something that looks like it can handle any "reasonable" type.
If there's a type that you can envision someone actually working with which has a node count that high, please share it. This is 50: ValueTuple<int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], string> or List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<int>>>>>(etc) |
This is what I would expect, yes. And I think it would be more intuitive if it worked this way. But this is not what the original post says:
It says here that a generic type like |
Why doesn't this include an API to get a If someone ever needs this, they're just going to do the less efficient thing of And vice versa - |
Part of my push back would be that |
Going back to 1 EDIT: I was wrong, apparently you don't need that |
Well, yes, but there's |
But this is 10: ValueTuple<List<string>, Dictionary<string, int>, int> This certainly doesn't look unreasonable to me. I have definitely written type names that are this long. And there should probably be a good amount of wiggle room. It's not like someone is likely to hit OOM by parsing a type name with 50 nodes (unless the strings are too long, but those are not part of the calculation anyway). |
Background and Motivation
For years various teams at Microsoft and outside of it had been implementing their own type name parsers.
None of them were good for untrusted input. We want to put an end to it and provide a single, public API for parsing type names from untrusted input.
The new APIs need to be shipped in an OOB package that supports older monikers, as we have first party customers running on Full Framework that are going to use it.
Proposed API
The parser has two modes:
Type.GetType
).To prevent from unbounded recursion for inputs like
typeof(List<List<List<List<List<...>>>>>).FullName
, the parser introduces the concept of complexity:TypeName
instances that would be created if you were to totally deconstruct this instance and visit each intermediateTypeName
that occurs as part of deconstruction.int
andPerson
each have complexities of 1 because they're standalone types.int[]
has a complexity of 2 because to fully inspect it involves inspecting the array type itself, plus unwrapping the underlying type (int
) and inspecting that.Dictionary<string, List<int[][]>>
has complexity 8 because fully visiting it involves inspecting 8TypeName
instances total:Dictionary<string, List<int[][]>>
(the original type)Dictionary
2` (the generic type definition)string
(a type argument of Dictionary)List<int[][]>
(a type argument of DictionaryList`1
(the generic type definition)int[][]
(a type argument of List)int[]
(the underlying type ofint[][]
)int
(the underlying type ofint[]
Returned information matches the System.Type APIs: Name, FullName, AssemblyQualifiedName etc.
Usage Examples
Sample serialization binder (whole code with tests can be found here):
Risks
Introducing changes to the behavior of strict mode in the future can break the users.
If we ever do that, it will be just to enforce new security best practices and will break only very, very unusual input.
Initial issue description:
I am working on the design of new type parser. It's going to include a new type that represents the parsed type name, the type name parser and most likely an option bag for its customization. It may also include an
AssemblyNameParser
, but I am not sure whether this is going to be required (nobody has asked for a standalone assembly name parser).For brevity I am going to call these types:
TypeName
,TypeNameParser
andTypeNameParserOptions
.Before I submit the proposal, I want to have something that:
TypeNameParser
used bySystem.Private.CoreLib
TypeNameParser
used byILVerification
TypeNameParser
used byILCompiler*
I'll try to replace the Roslyn parser too, but I can't promise that (please let me know if this is a must have).
The thing I am not sure of is where the mentioned types should belong.
For example, currently the
Type.GetType(string name)
is part ofCoreLib
. I believe that my proposal should include a newType
method for loading type from a parsed name:So those who have parsed the type name and verified it, could load the type without parsing the type name again. This leads me to thinking, that
TypeName
should be part ofCoreLib
.But can I at the same time ship this type in an OOB package like
System.Reflection.Metadata
?net9.0
, so I would need to exclude it for this TFMnetstandard2.0
, which could lead into a situation where a .NET 9 apps references a NS2.0 library that references the package and it leads to two types with the same name being loaded and a runtime error when theTypeName
fromOOB
is passed toType.Load(TypeName
) inCoreLib
?I can also simply not extend the
Type
class and move theLoad
method toTypeName
itself and reference theTypeNameParser
as a link inCoreLib
, with sth like this:But this would lead into a situation where .NET 9 apps would load two type name parsers: one internal from
CoreLib
and another, public from the OOB package.@jkotas @GrabYourPitchforks are there any better solutions?
The text was updated successfully, but these errors were encountered: