-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient API to stream from a MetadataReader to a MetadataBuilder with modifications #30443
Comments
This feels like the domain of a higher level API that can offer you a sort of DOM that you can read/edit/write. I think Mono.Cecil would be more appropriate. |
I agree with Nick's Mono.Cecil recommendation. That said I have a prototype for a low-level code that efficiently hydrates PEBuilder from PEReader. It's however more complicated than it looks. The blocker at this point is the lack of representation for native resources on @jnm2 If you'd like to help move it forward I would be happy to accept PR with support for native resource reading and serialization. This is also something that multiple other systems could use (e.g. we could remove the native resource serialization logic from Roslyn, F#, etc.). Just fair warning - it's likely quite a bit of work (including public API design). Once the hydration is done there is of course the question of additions/updates. I don't think we want to support arbitrary metadata updates. That'd mean building a DOM etc. and Mono.Cecil already does that. What would be potentially useful though would be the ability to apply EnC delta metadata after the MetadataBuilder has been hydrated from the reader. That might have some quite interesting applications. |
Would you expect Mono.Cecil to be slower than hydrating a MetadataBuilder from a MetadataReader in one pass, skipping parts you want to remove? I don't think I want a DOM for this. If additions, modifications and deletions can be done in a general single-pass way with no intermediate representation between MetadataReader and MetadataBuilder, that's what I'm after. Could that converge with your prototype, @tmat? |
Yes.
I understand. Some operations can be done without DOM. It also depends what kind of APIs do you expect to be exposed for these operations. As I said, I can imagine an efficient API that applies EnC delta to a fully hydrated MetadataBuilder. I'm guessing most of the operations you want to do could be encoded as an EnC delta. First step is to implement round-tripping. That's what my prototype is trying to do. And that itself is non-trivial. |
Could something like this be both generally useful and efficient? (uses Dictionary, could be swapped for a more efficient map) public class MetadataVisitor
{
private readonly Dictionary<Handle, Handle> handleMap = new Dictionary<Handle, Handle>();
protected MetadataVisitor(MetadataReader reader, MetadataBuilder builder)
{
Reader = reader;
Builder = builder ?? throw new ArgumentNullException(nameof(builder));
}
protected MetadataReader Reader { get; }
protected MetadataBuilder Builder { get; }
public void VisitAll()
{
VisitAssemblyDefinition();
foreach (var handle in Reader.AssemblyFiles)
VisitAssemblyFile(handle);
foreach (var handle in Reader.AssemblyReferences)
VisitAssemblyReference(handle);
foreach (var handle in Reader.CustomDebugInformation)
VisitCustomDebugInformation(handle);
// ...
}
private Handle MapHandle(Handle readerHandle, Func<Handle, Handle> visitor)
{
if (readerHandle.IsNil) return default;
if (!handleMap.TryGetValue(readerHandle, out var mappedHandle))
handleMap.Add(readerHandle, mappedHandle = visitor.Invoke(mappedHandle));
return mappedHandle;
}
public StringHandle MapString(StringHandle readerHandle)
{
return (StringHandle)MapHandle(readerHandle, h =>
Builder.GetOrAddString(Reader.GetString((StringHandle)h)));
}
public GuidHandle MapGuid(GuidHandle readerHandle)
{
return (GuidHandle)MapHandle(readerHandle, h =>
Builder.GetOrAddGuid(Reader.GetGuid((GuidHandle)h)));
}
public BlobHandle MapBlob(BlobHandle readerHandle)
{
return (BlobHandle)MapHandle(readerHandle, h =>
Builder.GetOrAddBlob(Reader.GetBlobBytes((BlobHandle)h)));
}
public EntityHandle MapEntityHandle(EntityHandle readerHandle)
{
return readerHandle.Kind switch
{
HandleKind.AssemblyDefinition => EntityHandle.AssemblyDefinition,
HandleKind.AssemblyFile => (EntityHandle)MapAssemblyFile((AssemblyFileHandle)readerHandle),
HandleKind.AssemblyReference => MapAssemblyReference((AssemblyReferenceHandle)readerHandle),
HandleKind.CustomDebugInformation => MapCustomDebugInformation((CustomDebugInformationHandle)readerHandle),
_ => throw new NotImplementedException()
};
}
protected virtual AssemblyDefinitionHandle VisitAssemblyDefinition()
{
var assemblyDefinition = Reader.GetAssemblyDefinition();
return Builder.AddAssembly(
MapString(assemblyDefinition.Name),
assemblyDefinition.Version,
MapString(assemblyDefinition.Culture),
MapBlob(assemblyDefinition.PublicKey),
assemblyDefinition.Flags,
assemblyDefinition.HashAlgorithm);
}
public AssemblyFileHandle MapAssemblyFile(AssemblyFileHandle readerHandle)
{
return (AssemblyFileHandle)MapHandle(readerHandle, h => VisitAssemblyFile((AssemblyFileHandle)h));
}
protected virtual AssemblyFileHandle VisitAssemblyFile(AssemblyFileHandle readerHandle)
{
var assemblyFile = Reader.GetAssemblyFile(readerHandle);
return Builder.AddAssemblyFile(
MapString(assemblyFile.Name),
MapBlob(assemblyFile.HashValue),
assemblyFile.ContainsMetadata);
}
public AssemblyReferenceHandle MapAssemblyReference(AssemblyReferenceHandle readerHandle)
{
return (AssemblyReferenceHandle)MapHandle(readerHandle, h => VisitAssemblyReference((AssemblyReferenceHandle)h));
}
protected virtual AssemblyReferenceHandle VisitAssemblyReference(AssemblyReferenceHandle readerHandle)
{
var assemblyReference = Reader.GetAssemblyReference(readerHandle);
return Builder.AddAssemblyReference(
MapString(assemblyReference.Name),
assemblyReference.Version,
MapString(assemblyReference.Culture),
MapBlob(assemblyReference.PublicKeyOrToken),
assemblyReference.Flags,
MapBlob(assemblyReference.HashValue));
}
public CustomDebugInformationHandle MapCustomDebugInformation(CustomDebugInformationHandle readerHandle)
{
return (CustomDebugInformationHandle)MapHandle(readerHandle, h => VisitCustomDebugInformation((CustomDebugInformationHandle)h));
}
protected virtual CustomDebugInformationHandle VisitCustomDebugInformation(CustomDebugInformationHandle readerHandle)
{
var customDebugInformation = Reader.GetCustomDebugInformation(readerHandle);
return Builder.AddCustomDebugInformation(
MapEntityHandle(customDebugInformation.Parent),
MapGuid(customDebugInformation.Kind),
MapBlob(customDebugInformation.Value)); // Maybe this becomes invalid and should be dropped if not parsed?
}
} |
Copying the rows from reader to builder is the easy part. I already have very efficient code for that. It's not useful on its own though without the rest of the re-hydration code all the way up to PEReader, which includes native resources. |
@tmat It sounds like your prototype starts by copying everything rather than planning for some things to be added, modified or skipped in the initial copy. I'm not sure I see that as the most efficient way to add and modify and delete. How does deleting rows work when row handles are already serialized in blobs? Let's say for example that I want to strip out private types, modify private fields, and leave most tables uncopied altogether, but preserve attributes which are referencing many of the remaining types in the now-smaller table. Does calculating an EnC diff solve this problem? Is calculating and applying that diff close enough in efficiency to only copying what is needed in the first place? |
It's an early prototype :) Obviously, this requires more work. I started with the goal to roundtrip any assembly without modifying the metadata. Already turned out to be non-trivial problem. |
@tmat Did anything become of your rewriter? I've found myself in need of similar. |
Rewriting IL using System.Reflection.Metadata seems to bit a bit of a mind-melting endeavor. Due to the order-dependent relations between the tables, and most binary blobs needing row references to be patched, I haven't been able to find a scalable pattern for a single-pass read and write. It feels like something like this should be in theory possible. Maybe with something like a visitor pattern?
I'd like to do some things that are purely additive, such as inserting a module initializer, and others that subtract or modify rows, like generating ref assemblies from third party framework implementation assemblies. Some of these things will slow down every build because I want to put them in post-processing NuGet packages, so one goal is absolutely minimal overhead.
Does this sound like something that can be made generally useful and that you'd be willing to accept in SR.Metadata and keep up to date?
The text was updated successfully, but these errors were encountered: