improves arm-family support #1353

ivg · 2021-10-08T19:19:53Z

The BIL lifter is now enabled for pure-thumb modes (the M profiles), and thumb in the arch field of specification is correctly mapped back to armvXm to enable proper decoding of instructions in the thumb mode.
NWe now properly handle the entry point on thumb binaries that start in the T32 processor state (before that we were treating those binaries as A32 and starting disassembling from an incorrect entry point, which lead to a garbage output). Thanks to Benjamin Mourad (@bmourad01) who noticed this and suggested the fix.
To enable generic handling of mangled entry points, we properly set up code pointer alignments and applied them in the new promises that provide names and function starts directly from the specification, instead of relying on image. We can safely ignore the image itself as the image is always built from the specification, and the set of symbols in the image is a strict subset of the symbols provided in the specification.
In addition to rectifying the code alignment we also add more ARM architectures, including ARMv9 and 32-bit variants of ARMv8.

bmourad01 · 2021-10-08T20:21:41Z

lib/arm/arm_target.ml


  let v7 = if not is_bi_endian then v6t2 <: "armv7"
    else CT.Target.declare ~package (ordered "armv7")
        ~parent
        ~nicknames:["armv7"]
+        ~code_alignment:16


Question: does "armv7" refer to ARM mode or Thumb mode here? I'm wondering if having the code_alignment always at 16 is correct.

It refers to both modes. Only ARMv7-M is the thumb-only mode, and, presumably, ARMv8-M as well, though I didn't find any clear wording in the ARMv8 architectural reference.

Hmm, I'm confused then. According to the manual:

E2.6.1 Instruction alignment
A32 instructions are word-aligned.
T32 instructions are halfword-aligned.

So, if this refers to both modes, then does this allow for an A32 instruction to be halfword-aligned?

Yes, we have to keep the least alignment of the two. The proper way would be to attribute code alignment to the encoding rather than to the target. Because in the case of ARM there is no single code alignment. So we have to take the upper bound and say that any address that is at least halfword-aligned could be an instruction.

We remove the dependency on the Bap.Std.Image and instead use the image specification directly. These gives us strictly more symbols, as image imposes extra constraints, which my hide functions starts and their names. More information is not always better, as we now have more chances to get the conflicting knowledge. To ensure that we're able to preserve as much information as possible without compromising correctness we leverage our agent-based conflict resolution system. We push all names in which we're not completely sure into possible aliases and use a new agent, `bap:gossiper` to propse names from that set. To make everything work fine, we pushed down the reliability of the objdump symbolizer (as we want bap to have the final word). The improved symbolization facility uncovered a small bug in the way how the x86 lock intrinsic was implemented, it was named just `"lock"`, which obviously may conflict with a normal function with the same name (which was uncovered by our testsuite). This commit adds the `x86` prefix to the intrinsic, e.g., `x86:lock` as well as properly delimits the locked code with the corresponding `x86:unlock` intrinsic.

ivg · 2021-10-12T17:59:50Z

I moved the symbols and function starts-related work into a separate PR (#1355) to keep things clean. I will now add some more interworking tests for the newly added functionality.

It should be `call arm:unpredictable` instead of an interworking branch (which essentially breaks the disassembler)

This release brings This release brings Ghidra as the new disassembler and lifting backend, significantly improves our Thumb lifter (especially with respect to interworking), adds forward-chainging rules and context variables to the knowledge base, support for LLVM 12, a pass that flattens IR, and a new framework for pattern matching on bytes that leverages the available patterns and actions from the Ghidra project. It also contains many bug fixes and improvements, most notable performance improvements that make bap from 30 to 50 per cent faster. See below for the full list of changes. Package-wise, we split bap into three parts: `bap-core`, `bap`, and `bap-extra`. The `bap-core` metapackage contains the minimal set of core packages that is necessary to disassemble the binary, the `bap` package extends this set with various analysis, finally, `bap-extra` includes rarely used or hard to install packages, such as the symbolic executor, which is very heavy on installation, and `bap-ghidra`, which is right now in a very experimental stage and is only installable on Ubuntu 18.04, since it requires the libghidra-dev package available from ppa, ``` sudo add-apt-repository ppa:ivg/ghidra -y sudo apt-get install libghidra-dev -y sudo apt-get install libghidra-data -y ``` Changelog ========= Features -------- - BinaryAnalysisPlatform/bap#1325 adds armeb abi - BinaryAnalysisPlatform/bap#1326 adds experimental Ghidra disassembler and lifting backend - BinaryAnalysisPlatform/bap#1332 adds the flatten pass - BinaryAnalysisPlatform/bap#1341 adds context variables to the knowledge base - BinaryAnalysisPlatform/bap#1343 adds register aliases to the Core Theory - BinaryAnalysisPlatform/bap#1358 adds LLVM 12 support - BinaryAnalysisPlatform/bap#1360 extends the knowledge monad interface - BinaryAnalysisPlatform/bap#1363 adds forward-chaining rules and Primus Lisp methods - BinaryAnalysisPlatform/bap#1364 adds a generic byte pattern matcher based on Ghidra - BinaryAnalysisPlatform/bap#1365 adds support for the Thumb IT blocks - BinaryAnalysisPlatform/bap#1369 adds some missing `t2LDR.-i12` instructions to the Thumb lifter Improvements ------------ - BinaryAnalysisPlatform/bap#1336 improves the `main` function discovery heuristics - BinaryAnalysisPlatform/bap#1337 adds more Primus Lisp stubs and fixes some existing - BinaryAnalysisPlatform/bap#1342 uses context variables to store the current theory - BinaryAnalysisPlatform/bap#1344 uses the context variables to store the Primus Lisp state - BinaryAnalysisPlatform/bap#1355 tweaks symbolization and function start identification facilities - BinaryAnalysisPlatform/bap#1353 improves arm-family support - BinaryAnalysisPlatform/bap#1356 stops proposing aliases as potential subroutine names - BinaryAnalysisPlatform/bap#1361 rewrites knowledge and primus monads - BinaryAnalysisPlatform/bap#1370 tweaks Primus Lisp' method resolution to keep super methods - BinaryAnalysisPlatform/bap#1375 error handling and performance tweaks - BinaryAnalysisPlatform/bap#1378 improves reification of calls in the IR theory (part I) - BinaryAnalysisPlatform/bap#1379 improves semantics of some ITT instructions - BinaryAnalysisPlatform/bap#1380 Fixes handling of fallthroughs in IR theory Bug Fixes --------- - BinaryAnalysisPlatform/bap#1328 fixes C.ABI.Args `popn` and `align_even` operators - BinaryAnalysisPlatform/bap#1329 fixes frame layout calculation in the Primus loader - BinaryAnalysisPlatform/bap#1330 fixes the address size computation in the llvm backend - BinaryAnalysisPlatform/bap#1333 fixes and improves label handling in the IR theor - BinaryAnalysisPlatform/bap#1338 fixes core:eff theory - BinaryAnalysisPlatform/bap#1340 fixes the Node.update for graphs with unlabeled nodes - BinaryAnalysisPlatform/bap#1347 fixes a knowledge base race condition in the run plugin - BinaryAnalysisPlatform/bap#1348 fixes endianness in the raw loader - BinaryAnalysisPlatform/bap#1349 short-circuits evaluation of terms in Bap_main.init - BinaryAnalysisPlatform/bap#1350 fixes variable rewriter and some Primus Lisp symbolic functions - BinaryAnalysisPlatform/bap#1351 fixes and improves aarch64 lifter - BinaryAnalysisPlatform/bap#1352 fixes several Primus Lisp stubs - BinaryAnalysisPlatform/bap#1357 fixes some T32 instructions that are accessing to PC - BinaryAnalysisPlatform/bap#1359 fixes handling of let-bound variables in flatten pass - BinaryAnalysisPlatform/bap#1366 fixes a bug in the `cmp` semantics - BinaryAnalysisPlatform/bap#1374 fixes handling modified immediate constants in ARM T32 encoding - BinaryAnalysisPlatform/bap#1376 fixes fresh variable generation - BinaryAnalysisPlatform/bap#1377 fixes the IR theory implementation Tooling ------- - BinaryAnalysisPlatform/bap#1319 fixes the shared folder in deb packages - BinaryAnalysisPlatform/bap#1320 removes sudo from postinst and postrm actions in the deb packages - BinaryAnalysisPlatform/bap#1321 enables push flag in the publish-docker-image action - BinaryAnalysisPlatform/bap#1323 fixes the ppx_bap version in the dev-repo opam file - BinaryAnalysisPlatform/bap#1331 fixes the docker publisher, also enables manual triggering - BinaryAnalysisPlatform/bap#1327 fixes a typo in the ubuntu dockerfiles - BinaryAnalysisPlatform/bap#1345 fixes bapdoc - BinaryAnalysisPlatform/bap#1346 nightly tests are failing due to a bug upstream

bmourad01 reviewed Oct 8, 2021

View reviewed changes

ivg added 2 commits October 12, 2021 13:44

adds 32-bit variants of armv8 and armv9, specifies alignments

aec82c9

ivg force-pushed the improves-arm-support branch from 7eba7ce to aec82c9 Compare October 12, 2021 17:57

ivg added 3 commits October 13, 2021 14:35

fixes blx pc semantics

fdcaa62

It should be `call arm:unpredictable` instead of an interworking branch (which essentially breaks the disassembler)

assumes that all non word-aligned addresses have the T32 encoding

3b33dde

fixes the test case with a non-word-aligned base

bf3fcc6

ivg merged commit e91bd87 into BinaryAnalysisPlatform:master Oct 14, 2021

ivg mentioned this pull request Dec 8, 2021

releases BAP 2.4.0 ocaml/opam-repository#20177

Merged

ivg deleted the improves-arm-support branch March 9, 2022 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improves arm-family support #1353

improves arm-family support #1353

ivg commented Oct 8, 2021

bmourad01 Oct 8, 2021

ivg Oct 11, 2021 •

edited

Loading

bmourad01 Oct 12, 2021

ivg Oct 13, 2021

ivg commented Oct 12, 2021

improves arm-family support #1353

improves arm-family support #1353

Conversation

ivg commented Oct 8, 2021

bmourad01 Oct 8, 2021

Choose a reason for hiding this comment

ivg Oct 11, 2021 • edited Loading

Choose a reason for hiding this comment

bmourad01 Oct 12, 2021

Choose a reason for hiding this comment

ivg Oct 13, 2021

Choose a reason for hiding this comment

ivg commented Oct 12, 2021

ivg Oct 11, 2021 •

edited

Loading