Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing failures #158

Open
kquick opened this issue Jul 20, 2021 · 9 comments
Open

Parsing failures #158

kquick opened this issue Jul 20, 2021 · 9 comments
Assignees

Comments

@kquick
Copy link
Member

kquick commented Jul 20, 2021

I have some candidate code (PX4_Autopilot) that is having parser decode failures with multiple symptoms:

  1. Failure to decode an fcmp operation value. The operation values defined are 0-15, but the value that llvm-pretty-bc-parser is attempting to resolve is 24.
  2. ignoring the above by defaulting to ftrue, I'm seeing invalid operand values for the instruction.

Here's the llvm-dis output:

    %147 = load float, float* %146, align 4, !dbg !217760
    %148 = fmul nsz arcp float %147, 0x3F8457AE140000000, !dbg !217761
    %149 = fcmp nsz arcp ogt float %145, %148, !dbg !217769

for llvm-pretty-bc-parser, the llvm-disasm has problems on %149, where the default to ftrue still shows:

   %149 = fcmp true float %145, %147, !dbg ...`

Note the second operand is decoded as %147 whereas llvm decodes to %148.

It's unknown if there are other parsing problems, this one just happened to fail the parser due to the lookup in (1) above and therefore be detectable.

@kquick
Copy link
Member Author

kquick commented Jul 20, 2021

n.b., the llvm-disasm helpfully shows the 0x3F8457AE140000000 constant in register %148 to be 1.0e-2

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Sep 2, 2021

I think I'm encountering the same issue, or at least very similar one. To reproduce, compile the following file:

double a = 0;
void b() {
  for (int c = 0; c < 2; c++)
    a++;
  if (a)
    a = 7;
}

int main() {
  b();
  return 0;
}

With:

$ clang -O1 test.c -S -emit-llvm -fno-discard-value-names
$ clang -O1 test.c -c -emit-llvm -fno-discard-value-names

This will produce the following bitcode with Clang 10:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@a = dso_local local_unnamed_addr global double 0.000000e+00, align 8

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local void @b() local_unnamed_addr #0 {
entry:
  %0 = load double, double* @a, align 8, !tbaa !2
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  %tobool = fcmp une double %inc, 0.000000e+00
  %storemerge = select i1 %tobool, double 7.000000e+00, double %inc
  store double %storemerge, double* @a, align 8, !tbaa !2
  ret void

for.body:                                         ; preds = %entry, %for.body
  %1 = phi double [ %0, %entry ], [ %inc, %for.body ]
  %c.04 = phi i32 [ 0, %entry ], [ %inc1, %for.body ]
  %inc = fadd double %1, 1.000000e+00
  %inc1 = add nuw nsw i32 %c.04, 1
  %cmp = icmp eq i32 %c.04, 0
  br i1 %cmp, label %for.body, label %for.cond.cleanup
}

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  call void @b()
  ret i32 0
}

attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}
!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}

Attempting to parse this with llvm-pretty-bc-parser fails on the fcmp une double %inc, 0.000000e+00 instruction:

λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["FUNC_CODE_INST_CMP2","@b","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "parseField: unable to parse record field 4 of record Record {recordCode = 28, recordFields = [FieldLiteral (BitString {bsLength = 35, bsData = 4294967292}),FieldLiteral (BitString {bsLength = 5, bsData = 0}),FieldLiteral (BitString {bsLength = 5, bsData = 8}),FieldLiteral (BitString {bsLength = 5, bsData = 14})]}"})

The code which parses FUNC_CODE_INST_CMP2 is located here:

-- [opty,opval,opval,pred]
code
| code == 9
|| code == 28 -> label "FUNC_CODE_INST_CMP2" $ do
let field = parseField r
(lhs,ix0) <- getValueTypePair t r 0
`mplus` (do i <- adjustId =<< field 0 numeric
cxt <- getContext
return (forwardRef cxt i t, 1))
_predval <- field ix0 unsigned
let isfp = isJust $ msum [ do pty <- elimPrimType (typedType lhs)
_ <- elimFloatType pty
return ()
, do (_,vty) <- elimVector (typedType lhs)
pty <- elimPrimType vty
_ <- elimFloatType pty
return () ]
-- XXX: we're ignoring the fast math flags
let ix1 | isfp && length (recordFields r) > 3 = ix0 + 1
| otherwise = ix0
rhs <- getValue t (typedType lhs) =<< field ix1 numeric
let ty = typedType lhs
parseOp | isPrimTypeOf isFloatingPoint ty ||
isVectorOf (isPrimTypeOf isFloatingPoint) ty =
return . FCmp <=< fcmpOp
| otherwise =
return . ICmp <=< icmpOp
op <- field (ix1 + 1) parseOp
let boolTy = Integer 1
let rty = case ty of
Vector n _ -> Vector n (PrimType boolTy)
_ -> PrimType boolTy
result rty (op lhs (typedValue rhs)) d

Comparing that with the corresponding code in LLVM, I'm skeptical that the code in llvm-pretty-bc-parser is implemented correctly. In particular, the LLVM code parses the fast-math flags, whereas the llvm-pretty-bc-parser code tries to skip over the fast-math flags somewhere in the middle.

@RyanGlScott
Copy link
Contributor

Indeed, llvm-pretty-bc-parser can successfully parse fcmp instructions that lack fast-math flags, but it will fail if has fast-math flags. This can easily be seen by compiling the following example with and without the -ffast-math flag:`

double a = 0;
int main() {
  if (a) {
    a++;
  }
  return 0;
}

If you compile without -ffast-math:

$ clang -O3 test.c -emit-llvm -S -fno-discard-value-names
$ clang -O3 test.c -emit-llvm -c -fno-discard-value-names

You will get this test.ll file:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@a = dso_local local_unnamed_addr global double 0.000000e+00, align 8

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  %0 = load double, double* @a, align 8, !tbaa !2
  %tobool = fcmp fast une double %0, 0.000000e+00
  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %inc = fadd fast double %0, 1.000000e+00
  store double %inc, double* @a, align 8, !tbaa !2
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret i32 0
}

attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}
!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}

And llvm-pretty-bc-parser can parse test.bc without issue, since the fcmp instructions lack fast-math flags. If you repeat the same steps with the -ffast-math flag, however:

$ clang -O3 test.c -emit-llvm -S -fno-discard-value-names -ffast-math
$ clang -O3 test.c -emit-llvm -c -fno-discard-value-names -ffast-math

You will get this test.ll file:

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@a = dso_local local_unnamed_addr global double 0.000000e+00, align 8

; Function Attrs: nofree norecurse nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  %0 = load double, double* @a, align 8, !tbaa !2
  %tobool = fcmp fast une double %0, 0.000000e+00
  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %inc = fadd fast double %0, 1.000000e+00
  store double %inc, double* @a, align 8, !tbaa !2
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret i32 0
}

attributes #0 = { nofree norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="true" "no-jump-tables"="false" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="true" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0-4ubuntu1 "}
!2 = !{!3, !3, i64 0}
!3 = !{!"double", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}

Note that we now have an fcmp instruction with a fast flag. As a consequence, llvm-pretty-bc-parser fails to parse it:

λ> parseBitCodeFromFile "test.bc"
Left (Error {errContext = ["FUNC_CODE_INST_CMP2","@main","FUNCTION_BLOCK","FUNCTION_BLOCK_ID","value symbol table","MODULE_BLOCK","Bitstream"], errMessage = "parseField: unable to parse record field 3 of record Record {recordCode = 28, recordFields = [FieldLiteral (BitString {bsLength = 5, bsData = 1}),FieldLiteral (BitString {bsLength = 5, bsData = 7}),FieldLiteral (BitString {bsLength = 5, bsData = 14}),FieldLiteral (BitString {bsLength = 10, bsData = 254})]}"})

@RyanGlScott
Copy link
Contributor

To add a wrinkle in all of this, the example in #158 (comment) demonstrates a situation where an fcmp instruction in an .ll file seemingly lacks fast-math flags, but the corresponding .bc file does have fast-math flags. This can be verified by running llvm-bcanalyzer-10 test.bc -dump, which shows an INST_CMP2 with four operands (one for the comparison type, two for the argument values, and one for the fast-math flags) instead of the usual three:

<INST_CMP2 op0=4294967292 op1=0 op2=8 op3=14/>

I'm not quite sure why this happens, but either way, the culprit is the presence of fast-math flags in the .bc file.

RyanGlScott added a commit that referenced this issue Sep 2, 2021
Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing.
However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e.,
`fcmp` instructions). Per the LLVM source code
[here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377),
the fast-math flags are expected to come after all of the other fields, but the
code in `llvm-pretty-bc-parser` was assuming that they would appear in the
middle. See #158.

I addresses this issue by parsing all of the other fields upfront and avoiding
parsing the last part, which correspond to fast-math flags. While I was in
town, I also made sure to add comments in every other place in the parser where
we avoid parsing fast-math flags.
RyanGlScott added a commit that referenced this issue Sep 2, 2021
Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing.
However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e.,
`fcmp` instructions). Per the LLVM source code
[here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377),
the fast-math flags are expected to come after all of the other fields, but the
code in `llvm-pretty-bc-parser` was assuming that they would appear in the
middle. See #158.

I addresses this issue by parsing all of the other fields upfront and avoiding
parsing the last part, which correspond to fast-math flags. While I was in
town, I also made sure to add comments in every other place in the parser where
we avoid parsing fast-math flags.
RyanGlScott added a commit that referenced this issue Sep 13, 2021
Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing.
However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e.,
`fcmp` instructions). Per the LLVM source code
[here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377),
the fast-math flags are expected to come after all of the other fields, but the
code in `llvm-pretty-bc-parser` was assuming that they would appear in the
middle. See #158.

I addresses this issue by parsing all of the other fields upfront and avoiding
parsing the last part, which correspond to fast-math flags. While I was in
town, I also made sure to add comments in every other place in the parser where
we avoid parsing fast-math flags.
RyanGlScott added a commit that referenced this issue Sep 14, 2021
Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing.
However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e.,
`fcmp` instructions). Per the LLVM source code
[here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377),
the fast-math flags are expected to come after all of the other fields, but the
code in `llvm-pretty-bc-parser` was assuming that they would appear in the
middle. See #158.

I addresses this issue by parsing all of the other fields upfront and avoiding
parsing the last part, which correspond to fast-math flags. While I was in
town, I also made sure to add comments in every other place in the parser where
we avoid parsing fast-math flags.
RyanGlScott added a commit that referenced this issue Sep 14, 2021
Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing.
However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e.,
`fcmp` instructions). Per the LLVM source code
[here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377),
the fast-math flags are expected to come after all of the other fields, but the
code in `llvm-pretty-bc-parser` was assuming that they would appear in the
middle. See #158.

I addresses this issue by parsing all of the other fields upfront and avoiding
parsing the last part, which correspond to fast-math flags. While I was in
town, I also made sure to add comments in every other place in the parser where
we avoid parsing fast-math flags.
RyanGlScott added a commit that referenced this issue Sep 16, 2021
Currently, `llvm-pretty-bc-parser` skips over fast-math flags when parsing.
However, it was doing so incorrectly when parsing `FUNC_CODE_INST_CMP2` (i.e.,
`fcmp` instructions). Per the LLVM source code
[here](https://github.com/llvm/llvm-project/blob/b0391dfc737ede147e128fe877045f61fc5e4905/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L4342-L4377),
the fast-math flags are expected to come after all of the other fields, but the
code in `llvm-pretty-bc-parser` was assuming that they would appear in the
middle. See #158.

I addresses this issue by parsing all of the other fields upfront and avoiding
parsing the last part, which correspond to fast-math flags. While I was in
town, I also made sure to add comments in every other place in the parser where
we avoid parsing fast-math flags.
@Ptival
Copy link
Contributor

Ptival commented Oct 2, 2021

I'm seeing the following with the current master:

  %149 = fcmp ogt float %145, %148, !dbg !DILocation(line: 62, column: 12, scope: !256754, inlinedAt: !256755)

Does this mean the problem is fixed?

@RyanGlScott
Copy link
Contributor

I believe it is fixed, but I never figured out how to test against PX4_Autopilot, so I left this issue open as a reminder to do so—see #161 (comment). (Is the code you're testing from PX4_Autopilot?)

@kquick
Copy link
Member Author

kquick commented Oct 4, 2021

I was encountering multiple issues with the PX4_Autopilot code base (using build-bom extracted BC files). It was definitely encountering the fcmp issue that is fixed above, but when I patched around that issue, it ran into other problems. Since it's a stream of bits, it's entirely possible that my workaround was insufficient to get things back on track and this was the only issue, but I was unable to use anything later than LLVM9 on that code base and successfully parse it, so I would recommend repeating that experiment to see how things stand now with the above fix in place.

@RyanGlScott
Copy link
Contributor

I've been unable to even compile PX4_Autopilot with Clang to obtain .bc files, so I don't know how to test this myself. Can you describe how you did so?

@Ptival
Copy link
Contributor

Ptival commented Oct 6, 2021

I can confirm that, after fixing some bugs in the handling of fast math flags, I also read the correct operand in a bitcode file obtained via LLVM 10.

I need to fix some other bugs to be able to read the bitcode file obtained from LLVM 12.

@RyanGlScott I am using a nix flake provided by @kquick to get bitcode files from PX4_Autopilot.
To quote Kevin:

The way it works is that the build-bom generate-bitcode BLDCMD runs
BLDCMD but it watches for clang compilations and when it sees one, it
re-issues the clang compilation with options to generate LLVM bitcode.
It then places the generated LLVM bitcode in a tar file and then adds
that tarfile to the ELF file (.o or the exe) in a special section
".llvm_bitcode". The build-bom extract-bitcode is looking for that
section to retrieve the tarfile, from which it can get the llvm
bitcode file.
The reason it uses a tarfile is that tarfiles can be concatentated
together and the result is also a valid tarfile. So linking multiple
.o files will concatenate their sections, which gives you a a tarfile
with all the bitcodes in it.

@kquick kquick self-assigned this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants