Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug information incorrect #222

Closed
kquick opened this issue Jun 20, 2023 · 6 comments · Fixed by #228
Closed

Debug information incorrect #222

kquick opened this issue Jun 20, 2023 · 6 comments · Fixed by #228

Comments

@kquick
Copy link
Member

kquick commented Jun 20, 2023

With the change from 68d26a5 there is a regression that affects debug information. This was noticed in a client of this library when file and line information for functions was no longer available and tracked down to the guard statement in this identified commit.

It appears that this causes some internal structure synchronization issues of some kind. I will work on the test case more, but I added a smallprog.c file (compile via clang with -c -emit-llvm -O0 -g and then use llvm-disasm) which exhibits this metadata issue. In the previous implementation (without the guard), the following metadata is observed:

...
!28 = !{!"clang version 12.0.1"}
!29 = !{!"llvm.loop.mustprogress"}
!30 = !DIDerivedType(tag: 15, line: 0, baseType: !9, size: 64, align: 0, offset: 0, flags: 0)
!31 = !{null, !30}
!32 = distinct !DISubroutineType(flags: 0, types: !31)
!33 = distinct !DISubprogram(scope: !4, name: "sp_receive", file: !4, line: 17, type: !41, isLocal: false, isDefinition: false, scopeLine: 17, virtuality: 0, virtualIndex: 0, flags: 0, isOptimized: false, unit: !3, retainedNodes: !5)
!34 = distinct !DILexicalBlock(scope: !33, file: !4, line: 18, column: 24)
!35 = distinct !DILexicalBlock(scope: !34, file: !4, line: 20, column: 9)
!36 = distinct !DILexicalBlock(scope: !35, file: !4, line: 20, column: 9)
!37 = distinct !DILexicalBlock(scope: !36, file: !4, line: 20, column: 53)
...

With the newer code with the guard statement in place, the output changes:

...
!28 = !{!"clang version 12.0.1"}
!29 = !{!"llvm.loop.mustprogress"}
!30 = !DIDerivedType(tag: 15, line: 0, baseType: !9, size: 64, align: 0, offset: 0, flags: 0)
!31 = !{null, !30}
!32 = distinct !DISubprogram(scope: !4, name: "sp_receive", file: !4, line: 17, type: !40, isLocal: false, isDefinition: false, scopeLine: 17, virtuality: 0, virtualIndex: 0, flags: 0, isOptimized: false, unit: !3, retainedNodes: !5)
!32 = distinct !DISubroutineType(flags: 0, types: !31)
!33 = distinct !DILexicalBlock(scope: !32, file: !4, line: 18, column: 24)
!34 = distinct !DILexicalBlock(scope: !33, file: !4, line: 20, column: 9)
!35 = distinct !DILexicalBlock(scope: !34, file: !4, line: 20, column: 9)
!36 = distinct !DILexicalBlock(scope: !35, file: !4, line: 20, column: 53)
...

Note in particular that there are two !32 entries now, where the original !33 entry is now preceding the original !32 entry with the same label.

It does not appear to be possible to compare this to llvm-dis output from LLVM itself because the metadata mapping appears to be somewhat different, although there are not duplicates in the llvm-dis output either.

@RyanGlScott
Copy link
Contributor

Good catch.

I'm reluctant to revert these changes wholesale, as LLVM 13+ simply won't work without them. (Printing DIArgLists in the top-level metadata list will cause llvm-as to choke.) I wonder if there is another way to fix the issue that wouldn't require such drastic measures.

Do you have a sense for what is causing there to be duplicate !32 entries? It's not clear to me why this happens, since I was under the impression that metadata indices had already been resolved (and made unique) by the time we reach the unnamedEntries function.

@kquick
Copy link
Member Author

kquick commented Jun 20, 2023

I haven't had time to track it down. My hunch is that there are two parallel tables or lists somewhere and this drops the entries from one but the output is perhaps iterating over the other, causing issues when there is no correlation. That's a complete guess though.

I have so far only used this reverted form with pre-built bitcode from LLVM 12 or earlier; I have not tried it on LLVM 13 or later so hadn't yet discovered that it was broken. I'm fine with using this branch for the moment without merging, since my target is still in the progress of supporting later LLVM versions. I have a couple of other tasks to take care of before I can come back and track this down though, so I'll coordinate with you in case you get to it first.

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Jun 21, 2023

Indeed, filtering out entries in unnamedEntries occurs far too late in the pipeline. Aside from the issue you pointed out, the generated bitcode also has a dangling reference to !6 (where !6 is a DIExpression()) here:

!5 = !{}
!7 = !DIGlobalVariableExpression(var: !2, expr: !6)
!8 = !{!7}

I suspect that the filtering will need to occur at the point when the unnamed metadata entries are first added to the map that llvm-pretty-bc-parser's internals use, as that is the point when indices are assigned to each entry.

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Jun 30, 2023

I believe part of the issue is in how we register DIExpression metadata:

29 -> label "METADATA_EXPRESSION" $ do
isDistinct <- parseField r 0 nonzero
diExpr <- DebugInfoExpression . DIExpression <$> parseFields r 1 numeric
return $! updateMetadataTable (addDebugInfo isDistinct diExpr) pm

Where addDebugInfo adds a distinct, numbered entry to the mtNodes in a MetadataTable. As far as I can tell, that numbering is used to determine how the list of unnamed metadata in an LLVM module is printed later. (See here, in unnamedMetadata, which consults mtNodes.)

There are other ways to add metadata to the table without adding them to mtNodes, however. One such way is through addMdValue:

addMdValue :: Typed PValue -> MetadataTable -> MetadataTable
addMdValue tv mt = mt { mtEntries = addValue tv' (mtEntries mt) }
where
-- explicitly make a metadata value out of a normal value
tv' = Typed { typedType = PrimType Metadata
, typedValue = ValMd (ValMdValue tv)
}

Unlike addDebugInfo, this doesn't add metadata to the mtNodes, but rather to mtEntries. I believe this could offer us a way to insert DIExpressions without needing to create entries for them in the top-level list of unnamed metadata.

@RyanGlScott
Copy link
Contributor

addMdValue didn't quite do what I needed, but with a slightly tweaked version of addMdValue, I was able to come up with a working prototype:

diff --git a/src/Data/LLVM/BitCode/IR/Metadata.hs b/src/Data/LLVM/BitCode/IR/Metadata.hs
index b3b4bf2..02eace3 100644
--- a/src/Data/LLVM/BitCode/IR/Metadata.hs
+++ b/src/Data/LLVM/BitCode/IR/Metadata.hs
@@ -90,6 +90,13 @@ addMdValue tv mt = mt { mtEntries = addValue tv' (mtEntries mt) }
               , typedValue = ValMd (ValMdValue tv)
               }
 
+addMdDebugInfo :: DebugInfo' Int -> MetadataTable -> MetadataTable
+addMdDebugInfo di mt = mt { mtEntries = addValue tv (mtEntries mt) }
+  where
+  tv = Typed { typedType  = PrimType Metadata
+             , typedValue = ValMd (ValMdDebugInfo di)
+             }
+
 nameNode :: Bool -> Bool -> Int -> MetadataTable -> MetadataTable
 nameNode fnLocal isDistinct ix mt = mt
   { mtNodes    = IntMap.insert ix (fnLocal,isDistinct,mtNextNode mt) (mtNodes mt)
@@ -336,8 +343,7 @@ unnamedEntries pm = bimap Seq.fromList Seq.fromList (partitionEithers (mapMaybe
   lookupNode ref d ix = do
     tv <- lookupValueTableAbs ref (mtEntries mt)
     case tv of
-      Typed { typedValue = ValMd v } -> do
-        guard (not (mustAppearInline v))
+      Typed { typedValue = ValMd v } ->
         pure $! PartialUnnamedMd
           { pumIndex    = ix
           , pumValues   = v
@@ -345,14 +351,6 @@ unnamedEntries pm = bimap Seq.fromList Seq.fromList (partitionEithers (mapMaybe
           }
       _ -> error "Impossible: Only ValMds are stored in mtEntries"
 
-  -- DIExpressions and DIArgLists are always printed inline and should never be
-  -- printed in the global list of unnamed metadata. See
-  -- https://github.com/llvm/llvm-project/blob/65600cb2a7e940babf6c493503b9d3fd19f8cb06/llvm/lib/IR/AsmWriter.cpp#L1242-L1245
-  mustAppearInline :: PValMd -> Bool
-  mustAppearInline (ValMdDebugInfo (DebugInfoExpression{})) = True
-  mustAppearInline (ValMdDebugInfo (DebugInfoArgList{})) = True
-  mustAppearInline _ = False
-
 type InstrMdAttachments = Map.Map Int [(KindMd,PValMd)]
 
 type PKindMd = Int
@@ -996,9 +994,9 @@ parseMetadataEntry vt mt pm (fromEntry -> Just r) =
         (addDebugInfo isDistinct (DebugInfoLocalVariable dilv)) pm
 
     29 -> label "METADATA_EXPRESSION" $ do
-      isDistinct <- parseField r 0 nonzero
+      _isDistinct <- parseField r 0 nonzero
       diExpr     <- DebugInfoExpression . DIExpression <$> parseFields r 1 numeric
-      return $! updateMetadataTable (addDebugInfo isDistinct diExpr) pm
+      return $! updateMetadataTable (addMdDebugInfo diExpr) pm
 
     30 -> label "METADATA_OBJC_PROPERTY" $ do
       -- TODO
@@ -1138,17 +1136,8 @@ parseMetadataEntry vt mt pm (fromEntry -> Just r) =
       dial <- DIArgList
         <$> (map (mdForwardRef cxt mt) <$> parseFields r 0 numeric)
       return $! updateMetadataTable
-        -- Unlike most other forms of metadata, METADATA_ARG_LIST never updates
-        -- the @IsDistinct@ value. At least, that's my reading of the source
-        -- code here:
-        -- https://github.com/llvm/llvm-project/blob/8bad4ae679df6fc7dbd016dccbd3da34206e836b/llvm/lib/Bitcode/Reader/MetadataLoader.cpp#L2142-L2158
-        --
-        -- As a result, we use False below, which is the default value of
-        -- IsDistinct set here. It doesn't actually matter that much whether it
-        -- is True or False, since we filter out DIArgLists from the list of
-        -- global unnamed metadata entries anyway (see `unnamedEntries`). As
-        -- such, the value of IsDistinct is never used for anything meaningful.
-        (addDebugInfo False (DebugInfoArgList dial)) pm
+        (addMdDebugInfo (DebugInfoArgList dial)) pm
+
 
     code -> fail ("unknown record code: " ++ show code)

This passes disasm-test on LLVM 10 and later, which is great. The only part that makes me a little iffy is the fact that we now ignore the isDistinct value for DIExpression, which seems a bit funny. Thinking about it some more, it seems entirely plausible that isDistinct is vestigial for DIExpression. The only way you'd ever be able to observe it is when printing a DIExpression in a top-level list of unnamed metadata, but that can never happen for DIExpressions precisely because of this invariant.

@RyanGlScott
Copy link
Contributor

Thinking about it some more, it seems entirely plausible that isDistinct is vestigial for DIExpression.

I asked about this on the LLVM Discord, and they agree that DIExpression doesn't make use of isDistinct. As such, I think we can ignore it entirely.

RyanGlScott added a commit that referenced this issue Jul 3, 2023
Instead, put them in `mtEntries`, which aren't used when populating top-level
metadata lists. For more information, see the new `Note [Printing metadata
inline]`.

Fixes #222.

This is currently marked as a draft pending some issues with the `smallprog.ll`
test case that I added, which still need to be investigated.
kquick added a commit that referenced this issue Aug 8, 2023
RyanGlScott added a commit that referenced this issue Aug 15, 2023
Apple's fork of LLVM uses an ever-so-slightly different bitcode format than the
upstream version of LLVM. In particular, it adds an additional record to
`METADATA_DERIVED_TYPE`, which means that we must be slightly more permissive
during parsing to allow Apple LLVM bitcodes through. See the new `Note [Apple
LLVM]` for a slightly more detailed explanation. As that Note explains, this is
good enough for now, but we may want to think about a more robust solution in
the future.

I have added an `apple-llvm.bc` regression test to `disasm-test`, but it
doesn't yet pass the test suite for unrelated reasons documented in #222. I
have added a `known_bugs` entry for this test case until that issue is
resolved.

Fixes #235.
RyanGlScott added a commit that referenced this issue Aug 15, 2023
Apple's fork of LLVM uses an ever-so-slightly different bitcode format than the
upstream version of LLVM. In particular, it adds an additional record to
`METADATA_DERIVED_TYPE`, which means that we must be slightly more permissive
during parsing to allow Apple LLVM bitcodes through. See the new `Note [Apple
LLVM]` for a slightly more detailed explanation. As that Note explains, this is
good enough for now, but we may want to think about a more robust solution in
the future.

I have added an `apple-llvm.bc` regression test to `disasm-test`, but it
doesn't yet pass the test suite for unrelated reasons documented in #222. I
have added a `known_bugs` entry for this test case until that issue is
resolved.

Fixes #235.
RyanGlScott added a commit that referenced this issue Sep 26, 2023
Instead, put them in `mtEntries`, which aren't used when populating top-level
metadata lists. For more information, see the new `Note [Printing metadata
inline]`.

Fixes #222.

This is currently marked as a draft pending some issues with the `smallprog.ll`
test case that I added, which still need to be investigated.
RyanGlScott added a commit that referenced this issue Sep 26, 2023
Instead, put them in `mtEntries`, which aren't used when populating top-level
metadata lists. For more information, see the new `Note [Printing metadata
inline]`.

Fixes #222. This also allows us to remove some `known_bugs` files.
RyanGlScott added a commit that referenced this issue Sep 26, 2023
Instead, put them in `mtEntries`, which aren't used when populating top-level
metadata lists. For more information, see the new `Note [Printing metadata
inline]`.

Fixes #222. This also allows us to remove some `known_bugs` files.
RyanGlScott added a commit that referenced this issue Sep 26, 2023
Instead, put them in `mtEntries`, which aren't used when populating top-level
metadata lists. For more information, see the new `Note [Printing metadata
inline]`.

Fixes #222. This also allows us to remove some `known_bugs` files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants