-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: RuntimeHelpers.GetAssemblyFromRvaSpan API #71268
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsBackground and motivation
C# has support for easily declaring embedded PE data, via ReadOnlySpan<byte> span1 = new byte[] { 1, 2, 3, 4 };
ReadOnlySpan<byte> span2 = "Hello world"u8; Both result in two spans wrapping embedded PE data. There is a big limitation with this though, in that there's no safe way to then pass this data around over the heap without incurring additional allocations. You can certainly create a custom We should have a fast, safe, trimming-friendly way to pass embedded PE data around on the heap. cc. @AaronRobinsonMSFT @tannergooding API ProposalThe idea is to solve this by adding the following intrinsic API: namespace System.Runtime.CompilerServices;
public static class RuntimeHelpers
{
public static Assembly GetAssemblyForRvaSpan(ReadOnlySpan<byte> span);
} This API would resolve the assembly from a span generated from an RVA field, and return it. If the span does not wrap an RVA field, it can just throw. This way, developers would be able to just pass around arbitrary types wrapping both the assembly and the raw pointer to the span data, allowing them to not allocate anything while still making the code safe for assembly unloading. This could be useful in all sorts of cases where an API Usagestatic ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
ReadOnlySpan<byte> data = "Hello world"u8;
Assembly assembly = RuntimeHelpers.GetAssemblyForRvaSpan(data);
void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));
return new RvaSpanMemoryManager(ptr, data.Length, assembly).Memory;
} Same thing for any other arbitrary embedded binary blob. Alternative DesignsNot sure if a RisksNone that I can see.
|
I'm not really seeing the benefit here. Is the constant being reference from another assembly? I think the above could just as easily be written as follows: static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
ReadOnlySpan<byte> data = "Hello world"u8;
Assembly assembly = Assembly.GetExecutingAssembly();
void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));
return new RvaSpanMemoryManager(ptr, data.Length, assembly).Memory;
} Even if that is the case sharing these constants seems odd as it exposes fields, which I thought was not desirable for public API surface. |
Another option would be to only expose |
Could you share some context on this? Assuming the access is safe with the source assembly being properly tracked, why would that not be desireable? As in, the data would be abstracted away with a memory/span anyway, so why would the recommendation be to instead allocate an array and return that instead of the raw embedded data? 🤔 That said, I completely missed the fact in most cases |
What I would prefer is to make Roslyn optimize |
Given some more thoughts on this following @AaronRobinsonMSFT's comment, and realized we don't even actually need an assembly in the first place (which he also mentioned might be slower to retrieve than copying an array, if small enough), but a static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
ReadOnlySpan<byte> data = "Hello world"u8;
void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));
return new RvaSpanMemoryManager(ptr, data.Length, typeof(ThisContainingType)).Memory;
} This is much faster than using I still like @jkoritzinsky's suggestions of maybe having some type to support this so that developers don't have to worry about this logic and do everything manually (and I also personally would be nice to just have built-in support for passing safe constant data around with no allocation), but that can also be done in a separate, dedicated proposal if we wanted. Feel free to either close this or repurpose this just for this new memory manager proposal 🙂 |
I don't think there is any need for a new API in this case. Either this can be done in a manner that avoids ALC unloading concerns using an allocation and/or this requires an update to ECMA to note the fixed nature of constants. |
Background and motivation
C# has support for easily declaring embedded PE data, via
ReadOnlySpan<byte>
values. For instance:Both result in two spans wrapping embedded PE data. There is a big limitation with this though, in that there's no safe way to then pass this data around over the heap without incurring additional allocations. You can certainly create a custom
MemoryManager<byte>
and then useUnsafe.AsPointer(ref MemoryMarshal.GetReference(span))
to wrap the pointer to the data, which is fixed, but this doesn't account for assembly unloading, as the fact that underlying RVA field is being referenced will not be seen by the GC. This is especially problematic due to not having anUtf8String
type (which will likely not come at all at this point), but it applies to all embedded binary blobs in general. The second you want to pass them around on the heap, you need to just callToArray()
on them and allocate, so you can then pass them as aReadOnlyMemory<byte>
, or you need to give up on safety and just YOLO it with the workaround mentioned above. Neither of these is ideal at all.We should have a fast, safe, trimming-friendly way to pass embedded PE data around on the heap.
cc. @AaronRobinsonMSFT @tannergooding
API Proposal
The idea is to solve this by adding the following intrinsic API:
This API would resolve the assembly from a span generated from an RVA field, and return it. If the span does not wrap an RVA field, it can just throw. This way, developers would be able to just pass around arbitrary types wrapping both the assembly and the raw pointer to the span data, allowing them to not allocate anything while still making the code safe for assembly unloading.
This could be useful in all sorts of cases where an
Utf8String
could've been used, and where arrays/memorys are instead used today after callingToArray()
on UTF8 literals, which is not ideal. Other cases where people might want to pass around embedded binary blobs could leverage this too. For instance, I'm currently using the hack mentioned above in ComputeSharp.D2D1, but I'd really like to just switch to a safe approach here instead without having to give up on keeping that code path allocation free.API Usage
Same thing for any other arbitrary embedded binary blob.
Alternative Designs
Not sure if a
RuntimeFieldHandle
also keeps the source assembly alive. If that's the case we might decide to have an intrinsic to get that from an RVA span, though it might be less useful than having anAssembly
object. Not sure which would be easier to implement. I should note, this proposal is not necessarily with the best possible API shape, it's meant to also start a discussion on getting some way to do this though, because this being not doable at all is just really really unfortunate.Open questions
C# 11 and .NET 7 introduced the ability to also have constant data for primitive types that can account for endianness of a system. This means that in some cases, the returned span might not directly wrap an RVA field. Do we want to support those too somehow, or is the current proposal to only support
byte
spans enough? We might also just decide to extend support to additional types in the future, so the two don't necessarily have to be done at the same time.Risks
None that I can see.
Currently developers are forced to either use unsafe code or to waste allocations, so can't be worse than that.
The API would be hidden away in compiler services, so it's not meant for general use anyway.
The text was updated successfully, but these errors were encountered: