-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDB support for ilasm #5051
Comments
When I start using ILASM from VS environment instead of ILASM from the CoreCLR build, I ran into an assertion for a test, JIT\IL_Conformance\Old\directed\ldloca_s_r8: https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp#L7400-L7404 The MSILs from both ILASMs are technically identical, but only difference is the presence of PDB file due to https://github.com/dotnet/coreclr/issues/2982. When PDB file is present, JIT ensures spilling individual il instructions for good debugging experience. As shown below, in the case of NoPDB, the type for the local address on stack (from ldloca) is converted to an integer type since it is used for type cast to double. Note initially (when creating addr node) JIT starts from TY_BYREF conservatively for such var address. On the other hand in the case of PDB which failed to assert, JIT spills this address node to a local var. By the time JIT imports conv.r8, JIT uses the local var with TY_BYREF. The existing "IsVarAddr" API didn't capture such case so it ended up with feeding local var with byref type to cast operation, which is unexpected. There may be several ways to address this issue. The way I fix is to introduce a flag to local var (I can't simply add a flag to tree-node whose bits run out so conflict with other use). When JIT has to spill such var address (on stack) to a local var, JIT records such bit to the local var. I extends the implementation of ```IsVarAddr``` to ```impIsVarAddrOnStack``` in order to catch this case as well. So, when JIT needs to hammer the type like using "impBashVarAddrsToI", JIT now uses this API, instead. Without PDB (pass) ``` [0] ldloca.s 0 [1] dup [2] conv.r8 * stmtExpr void (top level) (IL 0x000... ???) | /--* cast double <- long --> cast from long to double | | \--* addr long --> This is local adress on stack, so type is hammered to long | | \--* lclVar double V00 loc0 \--* = double ``` With PDB (fail) ``` [0] ldloca.s 0 * stmtExpr void (top level) (IL 0x000... ???) | /--* addr byref | | \--* lclVar double V00 loc0 \--* = byref \--* lclVar byref V256 tmp0 --> Address is spilled to local var. [1] dup * stmtExpr void (top level) (IL 0x002... ???) \--* no_op void [ 2] conv.r8 * stmtExpr void (top level) (IL 0x003... ???) | /--* cast double <- byref --> !Assertion: Can't convert byref to double where the src is lclVar. | | \--* lclVar byref V256 tmp0 \--* = double ``` With PDB after fix (pass) ``` [0] ldloca.s 0 * stmtExpr void (top level) (IL 0x000... ???) | /--* addr byref | | \--* lclVar double V00 loc0 \--* = byref \--* lclVar byref V256 tmp0 --> Address is spilled to local var. JIT sets a bit to it. [1] dup * stmtExpr void (top level) (IL 0x002... ???) \--* no_op void [ 2] conv.r8 * stmtExpr void (top level) (IL 0x003... ???) | /--* cast double <- long --> byref is hammered to long same as the case of noPDB. | | \--* lclVar long V256 tmp0 \--* = double ```
When I start using ILASM from VS environment instead of ILASM from the CoreCLR build, I ran into the following assertion in codegenxarch.cpp. ``` noway_assert(op1->OperGet() == GT_LCL_VAR_ADDR || op1->OperGet() == GT_LCL_FLD_ADDR); ``` for the test, JIT\IL_Conformance\Old\directed\ldloca_s_r8. The MSILs from both ILASMs are technically identical, but only difference is the presence of PDB file due to https://github.com/dotnet/coreclr/issues/2982. When PDB file is present, JIT ensures spilling each il instruction for good debugging experience. As shown below, in the case of NoPDB, the type for the local address on stack (from ldloca) is converted to an integer type since it is used for type cast to double. Note initially (when creating addr node) JIT starts from TY_BYREF conservatively for such var address. On the other hand in the case of PDB which failed to assert, JIT spills this address node to a local var. By the time JIT imports conv.r8, JIT uses the local var with TY_BYREF. There may be several ways to address this issue. The way I fix is to introduce a flag to local var (I can't simply add a flag to tree-node whose bits run out so conflict with other use), and propagate a bit when such var address node is spilled. The assertion is relaxed based on this bit for GT_LCL_VAR. Preivously I attempted to aggressively hammer the type by refactoing ```impBashVarAddrsToI``` in various ways, but it was not straightforward to handle other assertions that are popped up. Without PDB (pass) ``` [0] ldloca.s 0 [1] dup [2] conv.r8 | /--* cast double <- long --> cast from long to double | | \--* addr long --> This is local adress on stack, so type is hammered to long | | \--* lclVar double V00 loc0 \--* = double ``` With PDB (fail) ``` [0] ldloca.s 0 * stmtExpr void (top level) (IL 0x000... ???) | /--* addr byref | | \--* lclVar double V00 loc0 \--* = byref \--* lclVar byref V256 tmp0 --> Address is spilled to local var. [1] dup * stmtExpr void (top level) (IL 0x002... ???) \--* no_op void [ 2] conv.r8 * stmtExpr void (top level) (IL 0x003... ???) | /--* cast double <- byref --> !Assertion: Can't convert byref to double where the src is lclVar. | | \--* lclVar byref V256 tmp0 \--* = double ```
WPF recently went down a path of trying to do some IL modification and we just ran into this issue. There are other avenues for us to explore, but this was blocking the path of least resistance. It seems like there are no plans to support /DEBUG in 3.0, is this accurate? |
@davidwr @jkotas @briansul @janvorli - Do we have anyone on the team who is an official owner for ilasm? I'm trying to find out if there would be any pushback adding a portable PDB writer to ilasm. An initial thought would be hosting CoreCLR so that we could invoke the writer that is in System.Reflection.Metadata. [EDIT]: To clarify I am only proposing a light-up dependency. If the machine where ILAsm was running was missing any of the dependencies we could print a warning and then proceed emitting the dll as ILAsm is operating now. |
Adding others who may be interested in this: @MichalStrehovsky @tmat @AaronRobinsonMSFT |
@noahfalk From my conversations with @RussKeldorph there is no official owner of ILAsm. I think this is a reasonable request for support. My only concern is the additional assets that would be needed to make this "just work". If I recall, ILAsm is now xcopy-able and as mentioned this work would fail gracefully if additional assets weren't present (i.e. missing Runtime). That however doesn't fully capture what would be required to make this scenario actually reliable. At a minimum the following would be needed to have ILAsm create PDBs:
Based on the above, we now have 3 artifacts that must be present and a system dependency. This is going to necessitate an update to deployment instructions and to nuget packages, not to mention the alluded to error message that indicates ILAsm isn't going to be able to perform PDB creation. Not saying this is impossible or that we shouldn't do this work since it would be beneficial for me as well. I just want to make sure the above costs and expectations are known. |
Correction: @dotnet/jit-contrib officially owns ILAsm (and ILDasm). dotnet/coreclr#23840 and dotnet/coreclr#25144 might be tangentially related to requirements in case you weren't aware of them. /cc @BruceForstall |
@RussKeldorph @AaronRobinsonMSFT - thanks for the info and feedback : )
Does anyone know where instructions can be found? If they exist they weren't easily discoverable at least : )
I'm not certain what expectations people have on that package, but tentatively I could see at least a few options: |
I wonder if it wouldn't be better to invert the approach. Make ILasm.exe a managed, self-contained .NET Core app and put all the C++ code into a native dll that gets loaded by the ILasm.exe app. |
Agree with @tmat, but let's take it a step further and use CoreRT and link-in the native bits so you still have a single exe. |
@tmat - @MichalStrehovsky and @jaredpar appeared to have strong feelings that ilasm should not have a full .Net Core dependency and there has been work to explicitly remove those dependencies. I'm sure they can represent their positions better than I can, but it appeared to be a combination of wanting single-file xcopy deploy, wanting a small size on disk, and difficulties dealing with .Net Core versioning. I don't have a horse in this particular race, if you can convince them its fine by me : ) @MsAbby - I'm not up-to-date with what CoreRT can do these days, but initially I'd be wary for a few reasons: |
IIRC the issues we had with .NET Core dependencies would not be present with self-contained .NET Core App. |
I guess it could be a good use case for |
I think not having PDBs is a pain for everyone who uses ILAsm right now. If we add support for it, everyone will want to use it. It would be sad if we make it an option that inflicts setup pain whenever someone wants to use it. While I'm one of the biggest proponents of not writing C++ for things that can/should be written in C#, CoreCLR has a really bad startup time. It's not suitable for a tool like ILAsm that gets potentially invoked thousands of times. The startup delay would add up to minutes in places like the CoreCLR test tree. If we rely on CoreCLR for this, we'll end up having to design an ILAsm compiler server (like Roslyn did, to solve this exact problem). The Windows build lab compiles their managed build tools with .NET Native and they shave off minutes of CPU time per build by doing that. After dotnet/coreclr#25144, ILAsm carries its own copy of IMetaDataEmit. We could extend that to generate the portable PDB structures. Portable PDB is based upon the ECMA-335 format which is exactly what |
I do not think that the CoreCLR startup would be a problem for ilasm. Note that ilasm has been paying for CoreCLR startup until the very recent David's change. Roslyn is running a few orders of magnitude more code than ilasm would ever run (even if it was completely rewritten in C#). However, I agree that writing the portable PDB writer in C++ sounds much easier than dealing with all the managed/unmanaged interop. (Rewriting ilasm/ildasm in C# would avoid the interop as well, but that is more work....) |
Did CoreCLR have to do the full startup dance when |
ilasm/ildasm used to call ‘coreclr_initialize’: https://github.com/dotnet/coreclr/blob/release/3.0/src/ildasm/unixcoreclrloader/coreclrloader.cpp#L53 . You are right that initializing whole VM is not necessary for unmanaged metadata reader/writer. I was just pointing out that it is not that expensive to drive the design decision. |
This would be wonderful for the larger community and solve at least one big issue facing an internal partner team. |
Interesting, I hadn't initially considered this but especially for the limited subset of pdb data that the IL assembler currently writes it does seem reasonable. Thanks for raising this suggestion! To add some additional context this issue started because a customer was asking how they could unblock themselves. For the path where we did hosting and reused the existing C# writer I had some small samples showing code similar to what they would need to write. I am a little concerned that a new C++ writer will feel like a more daunting task. I'd guess the amount of code needed will be less, but there will be more effort involved for someone with minimal knowledge of the format or any of the pre-existing code to learn what needs to be written. I'll do some investigation to evaluate better what that might look like, but if anyone has suggestions of particular patterns/examples we should look at do let me know. Thanks! |
I wrote a Portable PDB reader in native a while ago. I pushed it up to my github (https://github.com/AaronRobinsonMSFT/PPDB). A writer would be significantly more work, but it could serve as a start to help someone understand the format. |
After getting another opportunity to chat with our potential contributor on email my concerns were unfounded, they sounded happy with the approach of writing a portable PDB writer in C++. Does anyone have any objections to this plan? @dotnet/jit-contrib - since Russ says you guys own ilasm could someone confirm this plan has your thumbs up? @MichalStrehovsky - Above you were suggesting that we might use the implementation of IMetaDataEmit as a starting point for the portable PDB writer? Do you have a recommendation for what elements of the metadata implementation would be worth re-using and how we'd tie the overall solution together? I'll admit my first reaction was that the metadata codebase is a complex layering of many abstractions and caters to numerous other goals aside from emitting files so it may be more challenging than necessary. For example compared to something like @AaronRobinsonMSFT 's C++ portable PDB reader Aaron's code feels far easier to quickly understand and modify. Nonetheless I don't want to succumb to NIH syndrome and my first impressions may not be very accurate. I'd very much appreciate getting your thoughts on it. Thanks! |
I think it will have to be tied into the metadata emitter if we want to support embedded portable PDBs. Otherwise it can be just duplicated on the side. Duplication tends to be a better implementation strategy in a legacy codebase, as you point out (I don't know if anyone who's still on the team really did any work in /src/md besides Karel and Mei-Chin many years go). The writer has many oddities to support EnC, metadata merging used by link.exe, and stuff like that. Avoiding duplication is better for long-term maintainability though. Portable PDB format basically just defines a set of extra tables on top of the existing type system tables in the ECMA-335 file format. The IMetadataEmit interface is basically a set of methods that add entries into the individual tables ("add new type", "add new method", "add new field", etc.). AFAIK the result of the ILAsm already uses IMetadataEmit to generate the metadata. It then wraps the "naked" metadata stream in a PE file with an ICeeFileGen. I would look into just extending IMetadataEmit (adding IMetadataEmit3?) with methods that add entries into the new tables ("add new document", "add new method debug information", etc.). ILAsm can then use it to emit the debug info: for the embedded PDB case, we would just add debug table entries into the existing dispenser and the saving of that would naturally fall out. For the sidecar PDB case, ILAsm would grab a vanilla dispenser, use it, and save the naked stream by calling |
As far as I know the embedded PDB is discrete from any metadata the IL assembly happens to have (ie the resulting PE file has two metadata headers inside somewhere). I was assuming that we'd need a second instance of an emit interface if we went this route, one for the standard assembly metadata and a 2nd one for the portable PDB metadata. Is that what you were suggesting and I just misunderstood, or were you envisioning a scheme where a single emitter accumulates data for all tables and it understands which tables go into which blob? Thanks! |
I spoke with @noahfalk offline, and I'm ok with whichever native code plan we end up with, as the trade-offs get worked out. |
I only know what's in the spec. I assumed that since it says "metadata can (but doesn’t need to) be stored in the same metadata section of the PE/COFF file as the type system metadata", and the spec doesn't specify how we would find it in a PE file, it would be in the same streams as the type system tables. Looking at things in a hex editor, it doesn't appear to be the case (the contents of the I guess I'll defer to @tmat on that. |
Portable PDB metadata tables are designed so that they can be stored in the same metadata stream as type system metadata. But it turned out that it's better to store them separately, so Embedded PDBs don't use that capability. Instead, Embedded PDBs are stored compressed in debug directory. https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PE-COFF.md#embedded-portable-pdb-debug-directory-entry-type-17 |
If we don't anticipate they'll be stored into the type system tables in the future, maybe creating |
Ilasm can now generate PDB in portable PDB format #37702 |
@ivanbasov awesome work! would it make sense to use |
@akoeplinger thank you. I agree it would be nice to have consistency across the tools, but I have introduced a separate argument in order to avoid ambiguity with existing ilasm So the question is not about whether |
@ivanpovazan I would agree that taking inspiration from the
Since the
At this point we can support |
I'm fine with the approach @AaronRobinsonMSFT suggested above. The thing I care about most is that we don't break any CLI syntax for scripts/build projects already succesfully using ilasm. Having new syntax that is understandable/consistent with csc is certainly nice as well. I don't fret about it too much because I assume there are a relatively small set of people in the world who will need to format ad-hoc ilasm commands. |
The follow up work is tracked by #39436 |
Currently ilasm /DEBUG* is ignored on CoreCLR.
The text was updated successfully, but these errors were encountered: