Parsing failures #158

kquick · 2021-07-20T21:44:19Z

I have some candidate code (PX4_Autopilot) that is having parser decode failures with multiple symptoms:

Failure to decode an fcmp operation value. The operation values defined are 0-15, but the value that llvm-pretty-bc-parser is attempting to resolve is 24.
ignoring the above by defaulting to ftrue, I'm seeing invalid operand values for the instruction.

Here's the llvm-dis output:

    %147 = load float, float* %146, align 4, !dbg !217760
    %148 = fmul nsz arcp float %147, 0x3F8457AE140000000, !dbg !217761
    %149 = fcmp nsz arcp ogt float %145, %148, !dbg !217769

for llvm-pretty-bc-parser, the llvm-disasm has problems on %149, where the default to ftrue still shows:

   %149 = fcmp true float %145, %147, !dbg ...`

Note the second operand is decoded as %147 whereas llvm decodes to %148.

It's unknown if there are other parsing problems, this one just happened to fail the parser due to the lookup in (1) above and therefore be detectable.

The text was updated successfully, but these errors were encountered:

kquick · 2021-07-20T21:46:56Z

n.b., the llvm-disasm helpfully shows the 0x3F8457AE140000000 constant in register %148 to be 1.0e-2

RyanGlScott · 2021-09-02T00:40:24Z

I think I'm encountering the same issue, or at least very similar one. To reproduce, compile the following file:

double a = 0;
void b() {
  for (int c = 0; c < 2; c++)
    a++;
  if (a)
    a = 7;
}

int main() {
  b();
  return 0;
}

With:

$ clang -O1 test.c -S -emit-llvm -fno-discard-value-names
$ clang -O1 test.c -c -emit-llvm -fno-discard-value-names

This will produce the following bitcode with Clang 10:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@a = dso_local local_unnamed_addr global double 0.000000e+00, align 8

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local void @b() local_unnamed_addr #0 {
entry:
  %0 = load double, double* @a, align 8, !tbaa !2
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  %tobool = fcmp une double %inc, 0.000000e+00
  %storemerge = select i1 %tobool, double 7.000000e+00, double %inc
  store double %storemerge, double* @a, align 8, !tbaa !2
  ret void

for.body:                                         ; preds = %entry, %for.body
  %1 = phi double [ %0, %entry ], [ %inc, %for.body ]
  %c.04 = phi i32 [ 0, %entry ], [ %inc1, %for.body ]
  %inc = fadd double %1, 1.000000e+00
  %inc1 = add nuw nsw i32 %c.04, 1
  %cmp = icmp eq i32 %c.04, 0
  br i1 %cmp, label %for.body, label %for.cond.cleanup
}

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  call void @b()
  ret i32 0
}

attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}
!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}

Attempting to parse this with llvm-pretty-bc-parser fails on the fcmp une double %inc, 0.000000e+00 instruction:

λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["FUNC_CODE_INST_CMP2","@b","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "parseField: unable to parse record field 4 of record Record {recordCode = 28, recordFields = [FieldLiteral (BitString {bsLength = 35, bsData = 4294967292}),FieldLiteral (BitString {bsLength = 5, bsData = 0}),FieldLiteral (BitString {bsLength = 5, bsData = 8}),FieldLiteral (BitString {bsLength = 5, bsData = 14})]}"})

The code which parses FUNC_CODE_INST_CMP2 is located here:

llvm-pretty-bc-parser/src/Data/LLVM/BitCode/IR/Function.hs

Lines 986 to 1026 in d1a6057

    
           -- [opty,opval,opval,pred] 
        
           code 
        
            |  code == 9 
        
            || code == 28 -> label "FUNC_CODE_INST_CMP2" $ do 
        
             let field = parseField r 
        
             (lhs,ix0) <- getValueTypePair t r 0 
        
                         `mplus` (do i   <- adjustId =<< field 0 numeric 
        
                                     cxt <- getContext 
        
                                     return (forwardRef cxt i t, 1)) 
        
             _predval  <- field ix0 unsigned 
        
             let isfp = isJust $ msum [ do pty <- elimPrimType (typedType lhs) 
        
                                           _   <- elimFloatType pty 
        
                                           return () 
        
                                      , do (_,vty) <- elimVector (typedType lhs) 
        
                                           pty     <- elimPrimType vty 
        
                                           _       <- elimFloatType pty 
        
                                           return () ] 
        
             -- XXX: we're ignoring the fast math flags 
        
             let ix1 | isfp && length (recordFields r) > 3 = ix0 + 1 
        
                     | otherwise                           = ix0 
        
             rhs <- getValue t (typedType lhs) =<< field ix1 numeric 
        
             let ty = typedType lhs 
        
                 parseOp | isPrimTypeOf isFloatingPoint ty || 
        
                           isVectorOf (isPrimTypeOf isFloatingPoint) ty = 
        
                           return . FCmp <=< fcmpOp 
        
                         | otherwise = 
        
                           return . ICmp <=< icmpOp 
        
             op <- field (ix1 + 1) parseOp 
        
             let boolTy = Integer 1 
        
             let rty = case ty of 
        
                   Vector n _ -> Vector n (PrimType boolTy) 
        
                   _          -> PrimType boolTy 
        
             result rty (op lhs (typedValue rhs)) d

Comparing that with the corresponding code in LLVM, I'm skeptical that the code in llvm-pretty-bc-parser is implemented correctly. In particular, the LLVM code parses the fast-math flags, whereas the llvm-pretty-bc-parser code tries to skip over the fast-math flags somewhere in the middle.

RyanGlScott · 2021-09-02T13:23:46Z

Indeed, llvm-pretty-bc-parser can successfully parse fcmp instructions that lack fast-math flags, but it will fail if has fast-math flags. This can easily be seen by compiling the following example with and without the -ffast-math flag:`

double a = 0;
int main() {
  if (a) {
    a++;
  }
  return 0;
}

If you compile without -ffast-math:

$ clang -O3 test.c -emit-llvm -S -fno-discard-value-names
$ clang -O3 test.c -emit-llvm -c -fno-discard-value-names

You will get this test.ll file:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@a = dso_local local_unnamed_addr global double 0.000000e+00, align 8

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  %0 = load double, double* @a, align 8, !tbaa !2
  %tobool = fcmp fast une double %0, 0.000000e+00
  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %inc = fadd fast double %0, 1.000000e+00
  store double %inc, double* @a, align 8, !tbaa !2
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret i32 0
}

attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}
!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}

And llvm-pretty-bc-parser can parse test.bc without issue, since the fcmp instructions lack fast-math flags. If you repeat the same steps with the -ffast-math flag, however:

$ clang -O3 test.c -emit-llvm -S -fno-discard-value-names -ffast-math
$ clang -O3 test.c -emit-llvm -c -fno-discard-value-names -ffast-math

You will get this test.ll file:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@a = dso_local local_unnamed_addr global double 0.000000e+00, align 8

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  %0 = load double, double* @a, align 8, !tbaa !2
  %tobool = fcmp fast une double %0, 0.000000e+00
  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %inc = fadd fast double %0, 1.000000e+00
  store double %inc, double* @a, align 8, !tbaa !2
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret i32 0
}

attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}
!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}

Note that we now have an fcmp instruction with a fast flag. As a consequence, llvm-pretty-bc-parser fails to parse it:

λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["FUNC_CODE_INST_CMP2","@main","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "parseField: unable to parse record field 3 of record Record {recordCode = 28, recordFields = [FieldLiteral (BitString {bsLength = 5, bsData = 1}),FieldLiteral (BitString {bsLength = 5, bsData = 7}),FieldLiteral (BitString {bsLength = 5, bsData = 14}),FieldLiteral (BitString {bsLength = 10, bsData = 254})]}"})

RyanGlScott · 2021-09-02T13:26:34Z

To add a wrinkle in all of this, the example in #158 (comment) demonstrates a situation where an fcmp instruction in an .ll file seemingly lacks fast-math flags, but the corresponding .bc file does have fast-math flags. This can be verified by running llvm-bcanalyzer-10 test.bc -dump, which shows an INST_CMP2 with four operands (one for the comparison type, two for the argument values, and one for the fast-math flags) instead of the usual three:

<INST_CMP2 op0=4294967292 op1=0 op2=8 op3=14/>

I'm not quite sure why this happens, but either way, the culprit is the presence of fast-math flags in the .bc file.

Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing. However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e., `fcmp` instructions). Per the LLVM source code [here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377), the fast-math flags are expected to come after all of the other fields, but the code in `llvm-pretty-bc-parser` was assuming that they would appear in the middle. See #158. I addresses this issue by parsing all of the other fields upfront and avoiding parsing the last part, which correspond to fast-math flags. While I was in town, I also made sure to add comments in every other place in the parser where we avoid parsing fast-math flags.

Ptival · 2021-10-02T00:23:22Z

I'm seeing the following with the current master:

  %149 = fcmp ogt float %145, %148, !dbg !DILocation(line: 62, column: 12, scope: !256754, inlinedAt: !256755)

Does this mean the problem is fixed?

RyanGlScott · 2021-10-02T00:33:11Z

I believe it is fixed, but I never figured out how to test against PX4_Autopilot, so I left this issue open as a reminder to do so—see #161 (comment). (Is the code you're testing from PX4_Autopilot?)

kquick · 2021-10-04T18:28:13Z

I was encountering multiple issues with the PX4_Autopilot code base (using build-bom extracted BC files). It was definitely encountering the fcmp issue that is fixed above, but when I patched around that issue, it ran into other problems. Since it's a stream of bits, it's entirely possible that my workaround was insufficient to get things back on track and this was the only issue, but I was unable to use anything later than LLVM9 on that code base and successfully parse it, so I would recommend repeating that experiment to see how things stand now with the above fix in place.

RyanGlScott · 2021-10-04T18:30:37Z

I've been unable to even compile PX4_Autopilot with Clang to obtain .bc files, so I don't know how to test this myself. Can you describe how you did so?

Ptival · 2021-10-06T20:39:54Z

I can confirm that, after fixing some bugs in the handling of fast math flags, I also read the correct operand in a bitcode file obtained via LLVM 10.

I need to fix some other bugs to be able to read the bitcode file obtained from LLVM 12.

@RyanGlScott I am using a nix flake provided by @kquick to get bitcode files from PX4_Autopilot.
To quote Kevin:

The way it works is that the build-bom generate-bitcode BLDCMD runs
BLDCMD but it watches for clang compilations and when it sees one, it
re-issues the clang compilation with options to generate LLVM bitcode.
It then places the generated LLVM bitcode in a tar file and then adds
that tarfile to the ELF file (.o or the exe) in a special section
".llvm_bitcode". The build-bom extract-bitcode is looking for that
section to retrieve the tarfile, from which it can get the llvm
bitcode file.
The reason it uses a tarfile is that tarfiles can be concatentated
together and the result is also a valid tarfile. So linking multiple
.o files will concatenate their sections, which gives you a a tarfile
with all the bitcodes in it.

kquick mentioned this issue Aug 19, 2021

Error when parsing SQLite 3.32.1 with Clang 10: Instruction code 56 is unknown #148

Closed

RyanGlScott mentioned this issue Sep 2, 2021

Clean up treatment of fast-math flags during parsing #161

Merged

kquick self-assigned this Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing failures #158

Parsing failures #158

kquick commented Jul 20, 2021

kquick commented Jul 20, 2021

RyanGlScott commented Sep 2, 2021 •

edited

Loading

RyanGlScott commented Sep 2, 2021

RyanGlScott commented Sep 2, 2021

Ptival commented Oct 2, 2021

RyanGlScott commented Oct 2, 2021

kquick commented Oct 4, 2021

RyanGlScott commented Oct 4, 2021

Ptival commented Oct 6, 2021

Parsing failures #158

Parsing failures #158

Comments

kquick commented Jul 20, 2021

kquick commented Jul 20, 2021

RyanGlScott commented Sep 2, 2021 • edited Loading

RyanGlScott commented Sep 2, 2021

RyanGlScott commented Sep 2, 2021

Ptival commented Oct 2, 2021

RyanGlScott commented Oct 2, 2021

kquick commented Oct 4, 2021

RyanGlScott commented Oct 4, 2021

Ptival commented Oct 6, 2021

RyanGlScott commented Sep 2, 2021 •

edited

Loading