Start tracking debug info for inlined statements #61220

jakobbotsch · 2021-11-04T19:38:03Z

This change consists of two commits that are probably most easily reviewed commit by commit.

The first commit, 75c5b6e, changes IL_OFFSETX to a DebugInfo structure that carries around the extra bit flags that we were previously packing into the integer and also an InlineContext. Except for the changes to the emitter it is mainly a mechanical change: we primarily update uses of the old IL offset to use the IL offset in the DebugInfo as the storage location, but no attempts are made to fix up things or to track more information. The emitter used sentinel values specified as IL_OFFSETX, so I have changed the emitter parts to use new values to signify these.

The second commit 75c5b6e is more interesting. It changes the JIT to create proper debug information for statements from inlinees. The commit message is pretty good, so I'll copy it here:
Add support for tracking debug information in statements coming from
inlinees. Changes:

Turn on compDbgInfo in inlinees. We use the implicit boundaries from
the inline root, but we do not use any explicit boundaries. That is,
we do not query the EE for explicit boundaries for the inlinee.
Create InlineContexts eagerly and use them during import. All
DebugInfo created in the JIT is in a "consistent" state, meaning that
we never see an IL location set without a corresponding inline
context. This was difficult before as InlineContexts would be created
quite late, after the importer for the inlinee had run. We now create
it eagerly and attach it to debug info during importation. Later, when
we figure out whether an inline succeeded or not, we mark it as
succeeded or failed.
Stop carrying InlineContext around unconditionally in Statement. The
inline context is now only part of the debug info, which may not be
set. Inlining needs the inline context to create new inline contexts
and to check for recursive inlines. Previously it retrieved it from
the inline statement, but due to the above change we now have to get
it from somewhere else. To do this we now keep it unconditionally
together with InlineCandidateInfo so that we can retrieve it later.
Validate all created debug information when associated with a
statement. This is done by creating a bitvector containing IL
locations that mark the beginning of IL instructions, and validating
that all IL offsets point to these when Statement::SetDebugInfo is
called.
While we track debug info in statements from inlinees, the runtime
side is still not hooked up. Currently we track the information until
we get to rationalize, where we normalize all debug info back to the
root inserted as GT_IL_OFFSET nodes. The change is free of any diffs
due to this normalization. We also track IL offsets as part of basic
blocks: these are also normalized to be in the root.

As mentioned, there are no diffs (debug info or code) with this change due to the fact that we normalize all debug info back to the root during rationalize. I expect to continue my work on SPGO by dumping the debug info and validate that it helps with accuracy on optimized code before I hook it up in the remaining parts of the back end and in the runtime.

One thing I'm not sure of is how to display the new debug info in dumps. For a simple application

public class Program
{
    public static void Main()
    {
        Foo();
    }

    public static void Foo()
    {
        Bar();
    }

    public static void Bar()
    {
        System.Console.WriteLine("Bar");
    }
}

the JIT dump will display the new debug info in statements as follows:

***** BB01
STMT00003 ( INL02 @ 0x000[E-] ... ??? ) <- INL01 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000005] --C-G-------              ▌  CALL      void   System.Console.WriteLine
               [000004] ------------ arg0         └──▌  CNS_STR   ref   <string constant>

***** BB01
STMT00001 ( 0x005[E-] ... ??? )
               [000001] ------------              ▌  RETURN    void

Essentially, we can see that the call to System.WriteLine is actually from INL02, where it is at IL offset 0. In turn, this came from INL01 at IL offset 0, which came from the root at IL offset 0. The values in the brackets are the flags associated with the location: E means it is a stack Empty point, and the next flag is C which means it is a call. I have used this a few times to track down debug info diffs due to different flags, but not sure if it just too much visual clutter now.
The inline tree is displayed like the following (subject to change in #61208):

**************** Inline Tree
Inlines into 06000001 [via ExtendedDefaultPolicy] Program:Main()
  [INL01 IL=0000 TR=000000 06000002] [callee: below ALWAYS_INLINE size] Program:Foo()
    [INL02 IL=0000 TR=000002 06000003] [callee: below ALWAYS_INLINE size] Program:Bar()
      [INL00 IL=0005 TR=000005 0600009A] [FAILED: callee: noinline per IL/cached result] System.Console:WriteLine(System.String)

If anyone has suggestions on whether they want to see either the debug info or inline tree changed or presented differently then I'm all ears.

cc @dotnet/jit-contrib

Remove IL_OFFSETX in favor of a DebugInfo structure. Previously we were packing extra information into the upper bits of IL_OFFSETX, which are now separate bit fields on a new ILLocation structure. DebugInfo contains an ILLocation and also an inline context, which will be used in the future when tracking debug info inside of inlinees. Another problem with IL_OFFSETX was that there were several sentinel values used to describe prologs, epilogs and no mappings. However these were only used in code-gen, so refactor codegen to track this separately instead of having to muddle it into IL_OFFSETX. This makes it clearer what we can expect from IL offsets during JIT. This change is no-diff and PIN also shows that TP is not negatively affected; in fact, there seems to be a small TP gain, maybe because we don't have to handle sentinel values anymore.

Add support for tracking debug information in statements coming from inlinees. Changes: * Turn on compDbgInfo in inlinees. We use the implicit boundaries from the inline root, but we do not use any explicit boundaries. That is, we do not query the EE for explicit boundaries for the inlinee. * Create InlineContexts eagerly and use them during import. All DebugInfo created in the JIT is in a "consistent" state, meaning that we never see an IL location set without a corresponding inline context. This was difficult before as InlineContexts would be created quite late, after the importer for the inlinee had run. We now create it eagerly and attach it to debug info during importation. Later, when we figure out whether an inline succeeded or not, we mark it as succeeded or failed. * Stop carrying InlineContext around unconditionally in Statement. The inline context is now only part of the debug info, which may not be set. Inlining needs the inline context to create new inline contexts and to check for recursive inlines. Previously it retrieved it from the inline statement, but due to the above change we now have to get it from somewhere else. To do this we now keep it unconditionally together with InlineCandidateInfo so that we can retrieve it later. * Validate all created debug information when associated with a statement. This is done by creating a bitvector containing IL locations that mark the beginning of IL instructions, and validating that all IL offsets point to these when Statement::SetDebugInfo is called. * While we track debug info in statements from inlinees, the runtime side is still not hooked up. Currently we track the information until we get to rationalize, where we normalize all debug info back to the root inserted as GT_IL_OFFSET nodes. The change is free of any diffs due to this normalization. We also track IL offsets as part of basic blocks: these are also normalized to be in the root.

ghost · 2021-11-04T19:38:14Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This change consists of two commits that are probably most easily reviewed commit by commit.

The first commit, 75c5b6e, changes IL_OFFSETX to a DebugInfo structure that carries around the extra bit flags that we were previously packing into the integer and also an InlineContext. Except for the changes to the emitter it is mainly a mechanical change: we primarily update uses of the old IL offset to use the IL offset in the DebugInfo as the storage location, but no attempts are made to fix up things or to track more information. The emitter used sentinel values specified as IL_OFFSETX, so I have changed the emitter parts to use new values to signify these.

The second commit 75c5b6e is more interesting. It changes the JIT to create proper debug information for statements from inlinees. The commit message is pretty good, so I'll copy it here:
Add support for tracking debug information in statements coming from
inlinees. Changes:

Turn on compDbgInfo in inlinees. We use the implicit boundaries from
the inline root, but we do not use any explicit boundaries. That is,
we do not query the EE for explicit boundaries for the inlinee.
Create InlineContexts eagerly and use them during import. All
DebugInfo created in the JIT is in a "consistent" state, meaning that
we never see an IL location set without a corresponding inline
context. This was difficult before as InlineContexts would be created
quite late, after the importer for the inlinee had run. We now create
it eagerly and attach it to debug info during importation. Later, when
we figure out whether an inline succeeded or not, we mark it as
succeeded or failed.
Stop carrying InlineContext around unconditionally in Statement. The
inline context is now only part of the debug info, which may not be
set. Inlining needs the inline context to create new inline contexts
and to check for recursive inlines. Previously it retrieved it from
the inline statement, but due to the above change we now have to get
it from somewhere else. To do this we now keep it unconditionally
together with InlineCandidateInfo so that we can retrieve it later.
Validate all created debug information when associated with a
statement. This is done by creating a bitvector containing IL
locations that mark the beginning of IL instructions, and validating
that all IL offsets point to these when Statement::SetDebugInfo is
called.
While we track debug info in statements from inlinees, the runtime
side is still not hooked up. Currently we track the information until
we get to rationalize, where we normalize all debug info back to the
root inserted as GT_IL_OFFSET nodes. The change is free of any diffs
due to this normalization. We also track IL offsets as part of basic
blocks: these are also normalized to be in the root.

As mentioned, there are no diffs (debug info or code) with this change due to the fact that we normalize all debug info back to the root during rationalize. I expect to continue my work on SPGO by dumping the debug info and validate that it helps with accuracy on optimized code before I hook it up in the remaining parts of the back end and in the runtime.

One thing I'm not sure of is how to display the new debug info in dumps. For a simple application

public class Program
{
    public static void Main()
    {
        Foo();
    }

    public static void Foo()
    {
        Bar();
    }

    public static void Bar()
    {
        System.Console.WriteLine("Bar");
    }
}

the JIT dump will display the new debug info in statements as follows:

***** BB01
STMT00003 ( INL02 @ 0x000[E-] ... ??? ) <- INL01 @ 0x000[E-] <- INLRT @ 0x000[E-]
               [000005] --C-G-------              ▌  CALL      void   System.Console.WriteLine
               [000004] ------------ arg0         └──▌  CNS_STR   ref   <string constant>

***** BB01
STMT00001 ( 0x005[E-] ... ??? )
               [000001] ------------              ▌  RETURN    void

Essentially, we can see that the call to System.WriteLine is actually from INL02, where it is at IL offset 0. In turn, this came from INL01 at IL offset 0, which came from the root at IL offset 0. The inline tree is displayed like the following (subject to change in #61208):

**************** Inline Tree
Inlines into 06000001 [via ExtendedDefaultPolicy] Program:Main()
  [INL01 IL=0000 TR=000000 06000002] [callee: below ALWAYS_INLINE size] Program:Foo()
    [INL02 IL=0000 TR=000002 06000003] [callee: below ALWAYS_INLINE size] Program:Bar()
      [INL00 IL=0005 TR=000005 0600009A] [FAILED: callee: noinline per IL/cached result] System.Console:WriteLine(System.String)

If anyone has suggestions on whether they want to see this changed or presented differently then I'm all ears.

cc @dotnet/jit-contrib

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

jakobbotsch · 2021-11-04T19:40:09Z

PIN shows around 0.06% more instructions retired with SPMI over libraries.pmi, which is probably close to noise. As mentioned above, no SPMI diff in either code or debug info.

AndyAyersMS

Overall this looks great. I'm fine with the extra dump output as is.

Some small changes / comment issues here and there.

I also suggest moving some of the new classes to their own files. You can defer this if you like (or decide it's not a good idea).

AndyAyersMS · 2021-11-09T03:42:55Z

src/coreclr/jit/codegencommon.cpp

@@ -10899,9 +10780,9 @@ void CodeGen::genIPmappingGen()
            if ((block->bbRefs > 1) && (stmt != nullptr))
            {
                bool found = false;
-                if (stmt->GetILOffsetX() != BAD_IL_OFFSET)
+                if (stmt->GetDebugInfo() != BAD_IL_OFFSET)


Should this be an IsValid() check?

Yep, looks like this code is under #if 0, I'll update the block so that it compiles when it is enabled. It looks like something I may wish to try to enable in the future.

AndyAyersMS · 2021-11-09T03:45:15Z

src/coreclr/jit/gentree.h

@@ -6134,16 +6139,141 @@ struct GenTreeRetExpr : public GenTree

 class InlineContext;

+// Represents information about the location of an IL instruction.
+class ILLocation


I wonder if we should put these two classes in some other (perhaps new?) header, as they're more general than trees?

Good idea, I'll add debuginfo.h/cpp and move it there.

AndyAyersMS · 2021-11-09T03:46:03Z

src/coreclr/jit/gentree.cpp

+//------------------------------------------------------------------------
+// GetParent: Get debug info for the parent statement that inlined the
+// statement for this debug info.
+//


Add comment for the argument?

Also, nit: maybe par -> parent?

AndyAyersMS · 2021-11-09T03:51:57Z

src/coreclr/jit/compiler.h

@@ -2515,6 +2515,25 @@ inline LoopFlags& operator&=(LoopFlags& a, LoopFlags b)
    return a = (LoopFlags)((unsigned short)a & (unsigned short)b);
 }

+//  The following holds information about instr offsets in terms of generated code.


Maybe these go into the new header too?

I think I will leave these here, they are not new, I just moved them out of the Compiler class so they are not nested anymore.

AndyAyersMS · 2021-11-09T03:54:18Z

src/coreclr/jit/fgbasic.cpp

+    // Track offsets where IL instructions begin in DEBUG builds. Used to
+    // validate debug info generated by the JIT.
+    assert(codeSize == compInlineContext->GetILSize());
+    INDEBUG(FixedBitVect* ilInstsSet = FixedBitVect::bitVectInit(codeSize, this));


I like that you added this validation.

AndyAyersMS · 2021-11-09T04:00:44Z

src/coreclr/jit/gentree.cpp

+//    For invalid ILLocations, we print '???'.
+//    Otherwise the offset and flags are printed in the format 0xabc[EC].
+//
+void ILLocation::Dump() const


I guess if you move the class definitions to a new header these should move to a new .cpp file?

src/coreclr/jit/importer.cpp

AndyAyersMS · 2021-11-09T04:02:55Z

src/coreclr/jit/importer.cpp

 }

 /*****************************************************************************
- * Returns current IL offset with stack-empty and call-instruction info incorporated
+ * Remember the IL offset (including stack-empty info) for the trees we will


Update this comment block to new-style?

Will do.

Related to this function, I have been thinking that as a future change we might want to unconditionally record DebugInfo for all statements created during import and instead record all necessary information to determine much later (e.g. in the emitter) whether we should actually communicate that particular debug info back to the runtime. For instance, we would record all the different kinds of boundaries that the runtime can ask for as information in the ILLocation and then query the requested boundaries in the emitter instead.

The benefit would be that we can track debug information diffs in a much more "stable" way that does not depend on what the runtime tells us at a particular time. One challenge is with GenTreeCall nodes where we use a map on the side that is only populated when the runtime actually asks for boundaries to support the "method X returned Y" feature.

…ents

jakobbotsch · 2021-11-09T19:08:07Z

I addressed the feedback, can you take another look @AndyAyersMS?

AndyAyersMS

LGTM.

jakobbotsch added 2 commits November 4, 2021 19:54

jakobbotsch requested a review from AndyAyersMS November 4, 2021 19:38

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 4, 2021

Run jit-format

de5e24b

AndyAyersMS reviewed Nov 9, 2021

View reviewed changes

jakobbotsch added 4 commits November 9, 2021 12:49

Address feedback

fe1088e

Merge remote-tracking branch 'upstream/main' into debug-info-improvem…

316c451

…ents

Fix bad merge

19c1174

Run jit-format

652b8a1

AndyAyersMS approved these changes Nov 9, 2021

View reviewed changes

jakobbotsch merged commit 731364d into dotnet:main Nov 9, 2021

jakobbotsch deleted the debug-info-improvements branch November 9, 2021 19:32

ghost locked as resolved and limited conversation to collaborators Dec 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start tracking debug info for inlined statements #61220

Start tracking debug info for inlined statements #61220

jakobbotsch commented Nov 4, 2021 •

edited

Loading

ghost commented Nov 4, 2021

jakobbotsch commented Nov 4, 2021

AndyAyersMS left a comment

AndyAyersMS Nov 9, 2021

jakobbotsch Nov 9, 2021

AndyAyersMS Nov 9, 2021

jakobbotsch Nov 9, 2021 •

edited

Loading

AndyAyersMS Nov 9, 2021

AndyAyersMS Nov 9, 2021

jakobbotsch Nov 9, 2021

AndyAyersMS Nov 9, 2021

AndyAyersMS Nov 9, 2021

AndyAyersMS Nov 9, 2021

jakobbotsch Nov 9, 2021

jakobbotsch commented Nov 9, 2021 •

edited

Loading

AndyAyersMS left a comment

Start tracking debug info for inlined statements #61220

Start tracking debug info for inlined statements #61220

Conversation

jakobbotsch commented Nov 4, 2021 • edited Loading

ghost commented Nov 4, 2021

jakobbotsch commented Nov 4, 2021

AndyAyersMS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch Nov 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakobbotsch commented Nov 9, 2021 • edited Loading

AndyAyersMS left a comment

Choose a reason for hiding this comment

jakobbotsch commented Nov 4, 2021 •

edited

Loading

jakobbotsch Nov 9, 2021 •

edited

Loading

jakobbotsch commented Nov 9, 2021 •

edited

Loading