Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parsing METADATA_DERIVED_TYPE from Apple LLVM #247

Merged
merged 1 commit into from
Aug 15, 2023
Merged

Conversation

RyanGlScott
Copy link
Contributor

@RyanGlScott RyanGlScott commented Aug 14, 2023

Apple's fork of LLVM uses an ever-so-slightly different bitcode format than the upstream version of LLVM. In particular, it adds an additional record to METADATA_DERIVED_TYPE, which means that we must be slightly more permissive during parsing to allow Apple LLVM bitcodes through. See the new Note [Apple LLVM] for a slightly more detailed explanation. As that Note explains, this is good enough for now, but we may want to think about a more robust solution in the future.

I have added an apple-llvm.bc regression test to disasm-test, but it doesn't yet pass the test suite for unrelated reasons documented in #222. I have added a known_bugs entry for this test case until that issue is resolved.

Fixes #235.

@RyanGlScott
Copy link
Contributor Author

I'm not quite sure of the best way to test this, since the bug depends on a very particular .bc file to trigger, and we don't have any tests that just check if parseBitCodeFromFile *.bc works without errors (AFAICT). We could check in the corresponding .ll file as part of a disasm-test regression test, but that would only work if you run the test suite with Apple Clang on macOS.

Copy link
Member

@kquick kquick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added PR #249 which adds the ability to test these. I don't have the Apple bitcode file, but I believe you have one corresponding the Ryan Mcleeary's minimal example; please feel free to update #249 with that Apple-generated bitcode and corresponding .ll. You could even merge that PR before this one and rebase this one to ensure the fix here, although it's a pretty trivial change, so that's not critical.

@RyanGlScott
Copy link
Contributor Author

Thanks, @kquick! This would indeed be helpful. Sadly, now that I've actually tried out the swap.bc file from #235 with the bc_src_tessts directory, I've realized that there is another obstacle:

$ cabal run disasm-test -- -p swap
Disassembly tests
  rawBC llvm-as 14.0.0
    swap.bc
      llvm-range=pre-llvm15
        swap: llvm-dis: error: Not an int attribute (Producer: 'APPLE_1_1403.0.22.14.1_0' Reader: 'LLVM 14.0.0')
FAIL
          Exception: callProcess: llvm-dis "-o" "/tmp/swapllvm-dis80524-0.ll" "disasm-test/bc_src_tests/swap.bc" (exit 1): failed

1 out of 1 tests failed (0.01s)

My upstream version of LLVM really doesn't like Apple LLVM, it seems. I don't feel confident in my ability to hack the .bc file to change the Producer to something that upstream LLVM can parse, so perhaps it's best just to merge this on move on.

@kquick
Copy link
Member

kquick commented Aug 15, 2023

I just made a modification to #249 which allows llvm-dis to fail. The patch comments provide more details, but in general, llvm-dis isn't needed to ensure llvm-pretty-bc-parser can at least parse the bitcode, so I've made that adjustment.

@RyanGlScott
Copy link
Contributor Author

Thank again. Sadly, there's still another obstacle:

$ cabal run disasm-test -- -d True -p swap
Disassembly tests
  rawBC llvm-as 14.0.0
    swap.bc
      llvm-range=pre-llvm15
        swap: ## Running: llvm-dis -o /tmp/swapllvm-dis107457-0.ll disasm-test/bc_src_tests/swap.bc
llvm-dis: error: Not an int attribute (Producer: 'APPLE_1_1403.0.22.14.1_0' Reader: 'LLVM 14.0.0')
## Removing /tmp/swapllvm-dis107457-0.ll
Unnamed metadata (modUnnamedMd) indices: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,94,95,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175]
FAIL (0.01s)
          disasm-test/Main.hs:786:
          2 duplicated Unnamed metadata (modUnnamedMd) indices

1 out of 1 tests failed (0.03s)

The indices 94 and 95 end up duplicated somehow.

@kquick
Copy link
Member

kquick commented Aug 15, 2023

That is the situation that #228 will fix.

@RyanGlScott
Copy link
Contributor Author

Hah, I managed to forget about that issue entirely. (A lot has occurred in the llvm-pretty-bc-parser world in a short time!) I suppose we could merge this PR as-is and open a separate issue to add a test case once llvm-pretty-bc-parser can handle it.

@kquick
Copy link
Member

kquick commented Aug 15, 2023

Or you can add this test and add an entry in disasm-tests/known-bugs so that the failure is expected and that known-bugs file can be removed when 228 gets merged. That would be my recommendation, but I'm fine with whatever you decide on this.

Apple's fork of LLVM uses an ever-so-slightly different bitcode format than the
upstream version of LLVM. In particular, it adds an additional record to
`METADATA_DERIVED_TYPE`, which means that we must be slightly more permissive
during parsing to allow Apple LLVM bitcodes through. See the new `Note [Apple
LLVM]` for a slightly more detailed explanation. As that Note explains, this is
good enough for now, but we may want to think about a more robust solution in
the future.

I have added an `apple-llvm.bc` regression test to `disasm-test`, but it
doesn't yet pass the test suite for unrelated reasons documented in #222. I
have added a `known_bugs` entry for this test case until that issue is
resolved.

Fixes #235.
@RyanGlScott
Copy link
Contributor Author

Right you are—I also forgot about known_bugs. With that, this PR now has a test case, albeit one that is not yet ready for prime time.

@RyanGlScott RyanGlScott merged commit de2b94d into master Aug 15, 2023
@RyanGlScott RyanGlScott deleted the T235 branch August 15, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate why Xcode Clang-produced binaries fail to parse
2 participants