PDB support for ilasm #5051

kyulee1 · 2016-02-02T14:35:24Z

Currently ilasm /DEBUG* is ignored on CoreCLR.

When I start using ILASM from VS environment instead of ILASM from the CoreCLR build, I ran into an assertion for a test, JIT\IL_Conformance\Old\directed\ldloca_s_r8: https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp#L7400-L7404 The MSILs from both ILASMs are technically identical, but only difference is the presence of PDB file due to https://github.com/dotnet/coreclr/issues/2982. When PDB file is present, JIT ensures spilling individual il instructions for good debugging experience. As shown below, in the case of NoPDB, the type for the local address on stack (from ldloca) is converted to an integer type since it is used for type cast to double. Note initially (when creating addr node) JIT starts from TY_BYREF conservatively for such var address. On the other hand in the case of PDB which failed to assert, JIT spills this address node to a local var. By the time JIT imports conv.r8, JIT uses the local var with TY_BYREF. The existing "IsVarAddr" API didn't capture such case so it ended up with feeding local var with byref type to cast operation, which is unexpected. There may be several ways to address this issue. The way I fix is to introduce a flag to local var (I can't simply add a flag to tree-node whose bits run out so conflict with other use). When JIT has to spill such var address (on stack) to a local var, JIT records such bit to the local var. I extends the implementation of ```IsVarAddr``` to ```impIsVarAddrOnStack``` in order to catch this case as well. So, when JIT needs to hammer the type like using "impBashVarAddrsToI", JIT now uses this API, instead. Without PDB (pass) ``` [0] ldloca.s 0 [1] dup [2] conv.r8 * stmtExpr void (top level) (IL 0x000... ???) | /--* cast double <- long --> cast from long to double | | \--* addr long --> This is local adress on stack, so type is hammered to long | | \--* lclVar double V00 loc0 \--* = double ``` With PDB (fail) ``` [0] ldloca.s 0 * stmtExpr void (top level) (IL 0x000... ???) | /--* addr byref | | \--* lclVar double V00 loc0 \--* = byref \--* lclVar byref V256 tmp0 --> Address is spilled to local var. [1] dup * stmtExpr void (top level) (IL 0x002... ???) \--* no_op void [ 2] conv.r8 * stmtExpr void (top level) (IL 0x003... ???) | /--* cast double <- byref --> !Assertion: Can't convert byref to double where the src is lclVar. | | \--* lclVar byref V256 tmp0 \--* = double ``` With PDB after fix (pass) ``` [0] ldloca.s 0 * stmtExpr void (top level) (IL 0x000... ???) | /--* addr byref | | \--* lclVar double V00 loc0 \--* = byref \--* lclVar byref V256 tmp0 --> Address is spilled to local var. JIT sets a bit to it. [1] dup * stmtExpr void (top level) (IL 0x002... ???) \--* no_op void [ 2] conv.r8 * stmtExpr void (top level) (IL 0x003... ???) | /--* cast double <- long --> byref is hammered to long same as the case of noPDB. | | \--* lclVar long V256 tmp0 \--* = double ```

When I start using ILASM from VS environment instead of ILASM from the CoreCLR build, I ran into the following assertion in codegenxarch.cpp. ``` noway_assert(op1->OperGet() == GT_LCL_VAR_ADDR || op1->OperGet() == GT_LCL_FLD_ADDR); ``` for the test, JIT\IL_Conformance\Old\directed\ldloca_s_r8. The MSILs from both ILASMs are technically identical, but only difference is the presence of PDB file due to https://github.com/dotnet/coreclr/issues/2982. When PDB file is present, JIT ensures spilling each il instruction for good debugging experience. As shown below, in the case of NoPDB, the type for the local address on stack (from ldloca) is converted to an integer type since it is used for type cast to double. Note initially (when creating addr node) JIT starts from TY_BYREF conservatively for such var address. On the other hand in the case of PDB which failed to assert, JIT spills this address node to a local var. By the time JIT imports conv.r8, JIT uses the local var with TY_BYREF. There may be several ways to address this issue. The way I fix is to introduce a flag to local var (I can't simply add a flag to tree-node whose bits run out so conflict with other use), and propagate a bit when such var address node is spilled. The assertion is relaxed based on this bit for GT_LCL_VAR. Preivously I attempted to aggressively hammer the type by refactoing ```impBashVarAddrsToI``` in various ways, but it was not straightforward to handle other assertions that are popped up. Without PDB (pass) ``` [0] ldloca.s 0 [1] dup [2] conv.r8 | /--* cast double <- long --> cast from long to double | | \--* addr long --> This is local adress on stack, so type is hammered to long | | \--* lclVar double V00 loc0 \--* = double ``` With PDB (fail) ``` [0] ldloca.s 0 * stmtExpr void (top level) (IL 0x000... ???) | /--* addr byref | | \--* lclVar double V00 loc0 \--* = byref \--* lclVar byref V256 tmp0 --> Address is spilled to local var. [1] dup * stmtExpr void (top level) (IL 0x002... ???) \--* no_op void [ 2] conv.r8 * stmtExpr void (top level) (IL 0x003... ???) | /--* cast double <- byref --> !Assertion: Can't convert byref to double where the src is lclVar. | | \--* lclVar byref V256 tmp0 \--* = double ```

rladuca · 2019-05-17T16:10:13Z

WPF recently went down a path of trying to do some IL modification and we just ran into this issue. There are other avenues for us to explore, but this was blocking the path of least resistance.

It seems like there are no plans to support /DEBUG in 3.0, is this accurate?

@vatsan-madhavan

noahfalk · 2019-07-31T18:11:20Z

@davidwr @jkotas @briansul @janvorli - Do we have anyone on the team who is an official owner for ilasm?

I'm trying to find out if there would be any pushback adding a portable PDB writer to ilasm. An initial thought would be hosting CoreCLR so that we could invoke the writer that is in System.Reflection.Metadata.

[EDIT]: To clarify I am only proposing a light-up dependency. If the machine where ILAsm was running was missing any of the dependencies we could print a warning and then proceed emitting the dll as ILAsm is operating now.

noahfalk · 2019-07-31T22:17:50Z

Adding others who may be interested in this: @MichalStrehovsky @tmat @AaronRobinsonMSFT

AaronRobinsonMSFT · 2019-07-31T22:37:55Z

@noahfalk From my conversations with @RussKeldorph there is no official owner of ILAsm.

I think this is a reasonable request for support. My only concern is the additional assets that would be needed to make this "just work". If I recall, ILAsm is now xcopy-able and as mentioned this work would fail gracefully if additional assets weren't present (i.e. missing Runtime). That however doesn't fully capture what would be required to make this scenario actually reliable. At a minimum the following would be needed to have ILAsm create PDBs:

Targeted framework installed on the system.
nethost.dll which will find the installed Runtime to load [Aside to @noahfalk we have an import lib, not a static lib - sorry for the confusion].
A *.runtimeconfig.json to indicate framework requirements.
- Yes, we can get away without this but in my opinion that simply creates an even less reliable scenario.
Managed assembly with the implementation.

Based on the above, we now have 3 artifacts that must be present and a system dependency. This is going to necessitate an update to deployment instructions and to nuget packages, not to mention the alluded to error message that indicates ILAsm isn't going to be able to perform PDB creation.

Not saying this is impossible or that we shouldn't do this work since it would be beneficial for me as well. I just want to make sure the above costs and expectations are known.

RussKeldorph · 2019-07-31T23:06:39Z

Correction: @dotnet/jit-contrib officially owns ILAsm (and ILDasm).

dotnet/coreclr#23840 and dotnet/coreclr#25144 might be tangentially related to requirements in case you weren't aware of them.

/cc @BruceForstall

noahfalk · 2019-08-01T00:52:24Z

@RussKeldorph @AaronRobinsonMSFT - thanks for the info and feedback : )

This is going to necessitate an update to deployment instructions

Does anyone know where instructions can be found? If they exist they weren't easily discoverable at least : )

… and to nuget packages

I'm not certain what expectations people have on that package, but tentatively I could see at least a few options:
a) Add nothing to the NuGet packages. For use cases that need PDBs to be emitted the onus lies with the package user to compose the artifacts together.
b) Add all the artifacts except the framework to the existing NuGet package. This clearly makes the package a bit larger, but hopefully isn't overly burdensome on the use cases that didn't care about PDBs. Onus remains on the package user to furnish a runtime.
c) For completeness, add all artifacts + the framework either in the package or as NuGet dependencies. I'm assuming this option is not desirable because of the greatly increased size and previous efforts to expressly avoid a runtime dependency.

tmat · 2019-08-01T01:01:52Z

I wonder if it wouldn't be better to invert the approach. Make ILasm.exe a managed, self-contained .NET Core app and put all the C++ code into a native dll that gets loaded by the ILasm.exe app.

mjsabby · 2019-08-01T01:07:00Z

Agree with @tmat, but let's take it a step further and use CoreRT and link-in the native bits so you still have a single exe.

noahfalk · 2019-08-01T01:38:13Z

@tmat - @MichalStrehovsky and @jaredpar appeared to have strong feelings that ilasm should not have a full .Net Core dependency and there has been work to explicitly remove those dependencies. I'm sure they can represent their positions better than I can, but it appeared to be a combination of wanting single-file xcopy deploy, wanting a small size on disk, and difficulties dealing with .Net Core versioning. I don't have a horse in this particular race, if you can convince them its fine by me : )

@MsAbby - I'm not up-to-date with what CoreRT can do these days, but initially I'd be wary for a few reasons:
a) Platform reach - its not clear that CoreRT has the same breadth at the same quality bar as .Net Core.
b) Supportability - Last I knew CoreRT doesn't have any support for debugging and it doesn't seem reasonable to ship code we won't be able to debug

tmat · 2019-08-01T02:12:11Z

IIRC the issues we had with .NET Core dependencies would not be present with self-contained .NET Core App.

mjsabby · 2019-08-01T02:21:57Z

I guess it could be a good use case for /p:PublishSingleFile=true then.

MichalStrehovsky · 2019-08-01T07:53:21Z

I think not having PDBs is a pain for everyone who uses ILAsm right now. If we add support for it, everyone will want to use it. It would be sad if we make it an option that inflicts setup pain whenever someone wants to use it.

While I'm one of the biggest proponents of not writing C++ for things that can/should be written in C#, CoreCLR has a really bad startup time. It's not suitable for a tool like ILAsm that gets potentially invoked thousands of times. The startup delay would add up to minutes in places like the CoreCLR test tree. If we rely on CoreCLR for this, we'll end up having to design an ILAsm compiler server (like Roslyn did, to solve this exact problem). The Windows build lab compiles their managed build tools with .NET Native and they shave off minutes of CPU time per build by doing that.

After dotnet/coreclr#25144, ILAsm carries its own copy of IMetaDataEmit. We could extend that to generate the portable PDB structures. Portable PDB is based upon the ECMA-335 format which is exactly what IMetaDataEmit generates - it should have the necessary facilities.

jkotas · 2019-08-01T08:31:04Z

I do not think that the CoreCLR startup would be a problem for ilasm. Note that ilasm has been paying for CoreCLR startup until the very recent David's change. Roslyn is running a few orders of magnitude more code than ilasm would ever run (even if it was completely rewritten in C#).

However, I agree that writing the portable PDB writer in C++ sounds much easier than dealing with all the managed/unmanaged interop. (Rewriting ilasm/ildasm in C# would avoid the interop as well, but that is more work....)

MichalStrehovsky · 2019-08-01T08:38:31Z

Note that ilasm has been paying for CoreCLR startup until the very recent David's change

Did CoreCLR have to do the full startup dance when MetaDataGetDispenser was called? I would expect MetaDataGetDispenser to be pretty lightweight (metadata APIs sit on a pretty low level within the runtime). MetaDataGetDispenser is the only thing ILAsm used - I would expect the startup cost of the VM to be orders of magnitude higher than that.

jkotas · 2019-08-01T11:21:55Z

ilasm/ildasm used to call ‘coreclr_initialize’: https://github.com/dotnet/coreclr/blob/release/3.0/src/ildasm/unixcoreclrloader/coreclrloader.cpp#L53 . You are right that initializing whole VM is not necessary for unmanaged metadata reader/writer. I was just pointing out that it is not that expensive to drive the design decision.

AaronRobinsonMSFT · 2019-08-01T15:21:47Z

... writing the portable PDB writer in C++ ...

This would be wonderful for the larger community and solve at least one big issue facing an internal partner team.

noahfalk · 2019-08-01T22:34:56Z

... writing the portable PDB writer in C++ …

Interesting, I hadn't initially considered this but especially for the limited subset of pdb data that the IL assembler currently writes it does seem reasonable. Thanks for raising this suggestion! To add some additional context this issue started because a customer was asking how they could unblock themselves. For the path where we did hosting and reused the existing C# writer I had some small samples showing code similar to what they would need to write. I am a little concerned that a new C++ writer will feel like a more daunting task. I'd guess the amount of code needed will be less, but there will be more effort involved for someone with minimal knowledge of the format or any of the pre-existing code to learn what needs to be written. I'll do some investigation to evaluate better what that might look like, but if anyone has suggestions of particular patterns/examples we should look at do let me know. Thanks!

AaronRobinsonMSFT · 2019-08-02T04:20:34Z

I wrote a Portable PDB reader in native a while ago. I pushed it up to my github (https://github.com/AaronRobinsonMSFT/PPDB). A writer would be significantly more work, but it could serve as a start to help someone understand the format.

noahfalk · 2019-08-02T06:24:30Z

After getting another opportunity to chat with our potential contributor on email my concerns were unfounded, they sounded happy with the approach of writing a portable PDB writer in C++. Does anyone have any objections to this plan?

@dotnet/jit-contrib - since Russ says you guys own ilasm could someone confirm this plan has your thumbs up?

@MichalStrehovsky - Above you were suggesting that we might use the implementation of IMetaDataEmit as a starting point for the portable PDB writer? Do you have a recommendation for what elements of the metadata implementation would be worth re-using and how we'd tie the overall solution together? I'll admit my first reaction was that the metadata codebase is a complex layering of many abstractions and caters to numerous other goals aside from emitting files so it may be more challenging than necessary. For example compared to something like @AaronRobinsonMSFT 's C++ portable PDB reader Aaron's code feels far easier to quickly understand and modify. Nonetheless I don't want to succumb to NIH syndrome and my first impressions may not be very accurate. I'd very much appreciate getting your thoughts on it. Thanks!

MichalStrehovsky · 2019-08-02T10:32:46Z

I'll admit my first reaction was that the metadata codebase is a complex layering of many abstractions and caters to numerous other goals aside from emitting files so it may be more challenging than necessary

I think it will have to be tied into the metadata emitter if we want to support embedded portable PDBs. Otherwise it can be just duplicated on the side. Duplication tends to be a better implementation strategy in a legacy codebase, as you point out (I don't know if anyone who's still on the team really did any work in /src/md besides Karel and Mei-Chin many years go). The writer has many oddities to support EnC, metadata merging used by link.exe, and stuff like that. Avoiding duplication is better for long-term maintainability though.

Portable PDB format basically just defines a set of extra tables on top of the existing type system tables in the ECMA-335 file format. The IMetadataEmit interface is basically a set of methods that add entries into the individual tables ("add new type", "add new method", "add new field", etc.). AFAIK the result of the Save() method is what a standalone Portable PDB is (a "naked" metadata stream).

ILAsm already uses IMetadataEmit to generate the metadata. It then wraps the "naked" metadata stream in a PE file with an ICeeFileGen.

I would look into just extending IMetadataEmit (adding IMetadataEmit3?) with methods that add entries into the new tables ("add new document", "add new method debug information", etc.).

ILAsm can then use it to emit the debug info: for the embedded PDB case, we would just add debug table entries into the existing dispenser and the saving of that would naturally fall out. For the sidecar PDB case, ILAsm would grab a vanilla dispenser, use it, and save the naked stream by calling Save() on the dispenser.

noahfalk · 2019-08-02T20:50:28Z

for the embedded PDB case, we would just add debug table entries into the existing dispenser and the saving of that would naturally fall out

As far as I know the embedded PDB is discrete from any metadata the IL assembly happens to have (ie the resulting PE file has two metadata headers inside somewhere). I was assuming that we'd need a second instance of an emit interface if we went this route, one for the standard assembly metadata and a 2nd one for the portable PDB metadata. Is that what you were suggesting and I just misunderstood, or were you envisioning a scheme where a single emitter accumulates data for all tables and it understands which tables go into which blob? Thanks!

BruceForstall · 2019-08-03T01:30:14Z

I spoke with @noahfalk offline, and I'm ok with whichever native code plan we end up with, as the trade-offs get worked out.

MichalStrehovsky · 2019-08-03T12:48:53Z

As far as I know the embedded PDB is discrete from any metadata the IL assembly happens to have

I only know what's in the spec. I assumed that since it says "metadata can (but doesn’t need to) be stored in the same metadata section of the PE/COFF file as the type system metadata", and the spec doesn't specify how we would find it in a PE file, it would be in the same streams as the type system tables. Looking at things in a hex editor, it doesn't appear to be the case (the contents of the #~ stream where the tables are stored is identical between -debug:portable and -debug:embedded). I can't find the other #~ stream (there's only one occurrence of this literal in the file).

I guess I'll defer to @tmat on that.

tmat · 2019-08-05T04:10:06Z

Portable PDB metadata tables are designed so that they can be stored in the same metadata stream as type system metadata. But it turned out that it's better to store them separately, so Embedded PDBs don't use that capability. Instead, Embedded PDBs are stored compressed in debug directory. https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/specs/PE-COFF.md#embedded-portable-pdb-debug-directory-entry-type-17

MichalStrehovsky · 2019-08-05T07:45:35Z

But it turned out that it's better to store them separately

If we don't anticipate they'll be stored into the type system tables in the future, maybe creating IMetaDataEmit3 on the existing emitter is not needed and this can be in a new class that looks a lot like the existing metadata emitter. I guess it can be a judgement call. I don't have strong opinions.

ivanpovazan · 2020-07-14T07:56:17Z

Ilasm can now generate PDB in portable PDB format #37702
This feature is available through command line arguments: -debug -pdbfmt=portable

akoeplinger · 2020-07-14T10:47:51Z

@ivanbasov awesome work! would it make sense to use -debug:portable instead of a separate argument to be consistent with csc/roslyn?

ivanpovazan · 2020-07-14T11:34:13Z

@akoeplinger thank you.

I agree it would be nice to have consistency across the tools, but I have introduced a separate argument in order to avoid ambiguity with existing ilasm -debug options:
-debug:opt
-debug:impl

So the question is not about whether -pdbfmt and -debug options can be merged but whether it will be clear enough for ilasm users.
@noahfalk what do you think?

AaronRobinsonMSFT · 2020-07-14T17:40:39Z

So the question is not about whether -pdbfmt and -debug options can be merged but whether it will be clear enough for ilasm users.

@ivanpovazan I would agree that taking inspiration from the csc options is appropriate here.

-debug[+|-]                   Emit debugging information
-debug:{full|pdbonly|portable|embedded}
                              Specify debugging type ('full' is default,
                              'portable' is a cross-platform format,
                              'embedded' is a cross-platform format embedded into
                              the target .dll or .exe)

Since the debug flag appears to have a predefined set of options for ilasm and I personally wouldn't want to deal with the matrix of multiple argument, I propose slightly modifying your suggestion to align with existing ilasm syntax and acknowledge csc convention.

-debugfmt={full|portable}

At this point we can support full and portable. In the future we can add embedded as the need arises.

noahfalk · 2020-07-15T23:23:13Z

I'm fine with the approach @AaronRobinsonMSFT suggested above. The thing I care about most is that we don't break any CLI syntax for scripts/build projects already succesfully using ilasm. Having new syntax that is understandable/consistent with csc is certainly nice as well. I don't fret about it too much because I assume there are a relatively small set of people in the world who will need to format ad-hoc ilasm commands.

jkotas · 2020-12-11T21:37:31Z

The follow up work is tracked by #39436

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

ivanpovazan mentioned this issue Jun 10, 2020

Portable PDB support for ilasm #37702

Merged

rladuca mentioned this issue Jul 14, 2020

Wpf depends on packages no longer produced in runtime dotnet/wpf#3173

Closed

ivanpovazan mentioned this issue Jul 16, 2020

Change ilasm command line arguments for PDB generation to match csc convention #39436

Closed

jkotas closed this as completed Dec 11, 2020

ghost locked as resolved and limited conversation to collaborators Jan 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDB support for ilasm #5051

PDB support for ilasm #5051

kyulee1 commented Feb 2, 2016

rladuca commented May 17, 2019

noahfalk commented Jul 31, 2019 •

edited

Loading

noahfalk commented Jul 31, 2019

AaronRobinsonMSFT commented Jul 31, 2019

RussKeldorph commented Jul 31, 2019 •

edited

Loading

noahfalk commented Aug 1, 2019

tmat commented Aug 1, 2019

mjsabby commented Aug 1, 2019

noahfalk commented Aug 1, 2019

tmat commented Aug 1, 2019

mjsabby commented Aug 1, 2019

MichalStrehovsky commented Aug 1, 2019

jkotas commented Aug 1, 2019 •

edited

Loading

MichalStrehovsky commented Aug 1, 2019

jkotas commented Aug 1, 2019

AaronRobinsonMSFT commented Aug 1, 2019

noahfalk commented Aug 1, 2019

AaronRobinsonMSFT commented Aug 2, 2019

noahfalk commented Aug 2, 2019

MichalStrehovsky commented Aug 2, 2019

noahfalk commented Aug 2, 2019

BruceForstall commented Aug 3, 2019

MichalStrehovsky commented Aug 3, 2019

tmat commented Aug 5, 2019

MichalStrehovsky commented Aug 5, 2019

ivanpovazan commented Jul 14, 2020

akoeplinger commented Jul 14, 2020

ivanpovazan commented Jul 14, 2020 •

edited

Loading

AaronRobinsonMSFT commented Jul 14, 2020

noahfalk commented Jul 15, 2020

jkotas commented Dec 11, 2020

PDB support for ilasm #5051

PDB support for ilasm #5051

Comments

kyulee1 commented Feb 2, 2016

rladuca commented May 17, 2019

noahfalk commented Jul 31, 2019 • edited Loading

noahfalk commented Jul 31, 2019

AaronRobinsonMSFT commented Jul 31, 2019

RussKeldorph commented Jul 31, 2019 • edited Loading

noahfalk commented Aug 1, 2019

tmat commented Aug 1, 2019

mjsabby commented Aug 1, 2019

noahfalk commented Aug 1, 2019

tmat commented Aug 1, 2019

mjsabby commented Aug 1, 2019

MichalStrehovsky commented Aug 1, 2019

jkotas commented Aug 1, 2019 • edited Loading

MichalStrehovsky commented Aug 1, 2019

jkotas commented Aug 1, 2019

AaronRobinsonMSFT commented Aug 1, 2019

noahfalk commented Aug 1, 2019

AaronRobinsonMSFT commented Aug 2, 2019

noahfalk commented Aug 2, 2019

MichalStrehovsky commented Aug 2, 2019

noahfalk commented Aug 2, 2019

BruceForstall commented Aug 3, 2019

MichalStrehovsky commented Aug 3, 2019

tmat commented Aug 5, 2019

MichalStrehovsky commented Aug 5, 2019

ivanpovazan commented Jul 14, 2020

akoeplinger commented Jul 14, 2020

ivanpovazan commented Jul 14, 2020 • edited Loading

AaronRobinsonMSFT commented Jul 14, 2020

noahfalk commented Jul 15, 2020

jkotas commented Dec 11, 2020

noahfalk commented Jul 31, 2019 •

edited

Loading

RussKeldorph commented Jul 31, 2019 •

edited

Loading

jkotas commented Aug 1, 2019 •

edited

Loading

ivanpovazan commented Jul 14, 2020 •

edited

Loading