overhauls the target/architecture abstraction (3/n) #1227

ivg · 2020-10-02T21:01:46Z

In this episode, we liberate bap mc and bap objdump from the bonds
of the Arch.t representation. We also add the systemz lifter for
demonstration purposes. Of course, the lifter is minimal and far from
being usable, but that serves well its didactic purposes.

The interface of the bap mc command is preserved but is extended
with a few more command-line options that provide a great deal of
flexibility. Not only it is now possible to specify the target and
encoding, but it is now possible to pass options directly to the
backend, which is useful for disassembling targets that are not yet
known to BAP. Below is an excerpt from the bap-mc man page
(see bap mc --help)

       SETTING ARCHITECHTURE

       The target architecture is controlled by several groups of options that
       can not be used together:

       - arch;
       - target and encoding;
       - triple, backend, cpu, bits, and order.

       The arch option provides the least control but is easiest to use. It
       relies on the dependency-injection mechanism and lets the target
       support packages (plugins that implement support for the given
       architecture) do their best to guess the target and encoding that
       matches the provided name. Use the common names for the architecture
       and it should work. You can use the bits and order options to give more
       hints to the target support packages. They default to 32 and little
       correspondingly.

       The target and encoding provides precise control over the selection of
       the target and the encoding that is used to represent machine
       instructions. The encoding field can be omitted and will be deduced
       from the target. Use  bap list targets and  bap list encodings to get
       the list of supported targets and encodings respectivly.

       Finally, the triple, backend, cpu,... group of options provides the
       full control over the disassembler backend and bypasses the
       dependency-injection mechanism to pass the specified options directly
       to the corresponding backends. This enables disassembling of targets
       and encodings that are not yet supported by BAP. The meanings of the
       options totally depend on the selected backend and they are passed as
       is to the corresponding arguments of the Disasm_expert.Basic.create
       function. The bits and order defaults to 32 and little corresondingly
       and are used to specify the number of bits in the target's addresses
       and the order of bytes in the word. This group of options is useful
       during the implementation and debugging of new targets and thus is
       reserved for experts. Note, when this group is used the semantics of
       the instructions will not be provided as it commonly requires the
       target specification.

ivg · 2020-10-02T21:03:29Z

As a small teaser, I will finish it on Monday (need a bit more documentation and polishing, probably will add a couple more instruction to the systemz lifter).

In this episode, we liberate `bap mc` and `bap objdump` from the bonds of the `Arch.t` representation. We also add the systemz lifter for demonstration purposes. Of course, the lifter is minimal and far from being usable, but that serves well its didactic purposes. The interface of the `bap mc` command is preserved but is extended with a few more command-line options that provide a great deal of flexibility. Not only it is now possible to specify the target and encoding, but it is now possible to pass options directly to the backend, which is useful for disassembling targets that are not yet known to BAP. Below is an excerpt from the bap-mc man page (see bap mc --help) ``` SETTING ARCHITECHTURE The target architecture is controlled by several groups of options that can not be used together: - arch; - target and encoding; - triple, backend, cpu, bits, and order. The arch option provides the least control but is easiest to use. It relies on the dependency-injection mechanism and lets the target support packages (plugins that implement support for the given architecture) do their best to guess the target and encoding that matches the provided name. Use the common names for the architecture and it should work. You can use the bits and order options to give more hints to the target support packages. They default to 32 and little correspondingly. The target and encoding provides precise control over the selection of the target and the encoding that is used to represent machine instructions. The encoding field can be omitted and will be deduced from the target. Use bap list targets and bap list encodings to get the list of supported targets and encodings respectivly. Finally, the triple, backend, cpu,... group of options provides the full control over the disassembler backend and bypasses the dependency-injection mechanism to pass the specified options directly to the corresponding backends. This enables disassembling of targets and encodings that are not yet supported by BAP. The meanings of the options totally depend on the selected backend and they are passed as is to the corresponding arguments of the Disasm_expert.Basic.create function. The bits and order defaults to 32 and little corresondingly and are used to specify the number of bits in the target's addresses and the order of bytes in the word. This group of options is useful during the implementation and debugging of new targets and thus is reserved for experts. Note, when this group is used the semantics of the instructions will not be provided as it commonly requires the target specification. ```

This PR is the continuation of the BinaryAnalysisPlatform#1225, BinaryAnalysisPlatform#1226, and BinaryAnalysisPlatform#1227 series of changes that were focused on substituting the old and inextensible `Arch.t` abstraction with the new `Theory.Target.t` representation. This episode is instigated by the upcoming implementation of the RISCV target. Since RISCV is the out target that is not supported with Arch.t it became a good test of the new Theory.Target.t abstraction. As the RISCV worked showed, we still have lots of code that depends on Arch.t, most importantly Primus, which was fully dependent on Arch.t. The main issue was that Theory.Target.t doesn't provide any means to encode register classes, which prevented us from using it everywhere in Primus, e.g., we need to know which register is the stack pointer in order to setup the stack. To implement this, we introduce a new abstraction called _role_. A _role_ could be generally applied to any entity but so far we are only talking about the roles of registers in various targets. The target definiton now acccepts the `regs` paramater that takes the register file specification with each register assigned one or more roles, e.g., here is the register file specification for 8086, ```ocaml Theory.Role.Register.[ [general; integer], main @< index @< segment; [stack_pointer], untyped [reg r16 "SP"]; [frame_pointer], untyped [reg r16 "BP"]; [Role.index], untyped index; [Role.segment], untyped segment; [status], untyped flags; [integer], untyped [ reg bool "CF"; reg bool "PF"; reg bool "AF"; reg bool "ZF"; reg bool "SF"; reg bool "OF"; ] ``` I.e., we assign a set of roles to a set of registers. We also now have two new functions `Theory.Target.regs` and `Theory.Target.reg` that enable querying the register file of the target for register that fulfill one or more roles. Whilst we publish a limited number of well-known (blessed) roles in the `Theory.Role.Register` module, more roles could be added as user need it. For example, in the code snippet above we have two non-standard roles that are specific to the x86 architectures, `Role.index` and `Role.segment`. With roles we can drop the dependency on Target in most of the places where it makes sense (I still left it in x86 and other target-specific plugins, which obviously are independent on the newly added architectures).

This PR is the continuation of the #1225, #1226, and #1227 series of changes that were focused on substituting the old and inextensible `Arch.t` abstraction with the new `Theory.Target.t` representation. This episode is instigated by the upcoming implementation of the RISCV target. Since RISCV is the out target that is not supported with Arch.t it became a good test of the new Theory.Target.t abstraction. As the RISCV worked showed, we still have lots of code that depends on Arch.t, most importantly Primus, which was fully dependent on Arch.t. The main issue was that Theory.Target.t doesn't provide any means to encode register classes, which prevented us from using it everywhere in Primus, e.g., we need to know which register is the stack pointer in order to setup the stack. To implement this, we introduce a new abstraction called _role_. A _role_ could be generally applied to any entity but so far we are only talking about the roles of registers in various targets. The target definiton now acccepts the `regs` paramater that takes the register file specification with each register assigned one or more roles, e.g., here is the register file specification for 8086, ```ocaml Theory.Role.Register.[ [general; integer], main @< index @< segment; [stack_pointer], untyped [reg r16 "SP"]; [frame_pointer], untyped [reg r16 "BP"]; [Role.index], untyped index; [Role.segment], untyped segment; [status], untyped flags; [integer], untyped [ reg bool "CF"; reg bool "PF"; reg bool "AF"; reg bool "ZF"; reg bool "SF"; reg bool "OF"; ] ``` I.e., we assign a set of roles to a set of registers. We also now have two new functions `Theory.Target.regs` and `Theory.Target.reg` that enable querying the register file of the target for register that fulfill one or more roles. Whilst we publish a limited number of well-known (blessed) roles in the `Theory.Role.Register` module, more roles could be added as user need it. For example, in the code snippet above we have two non-standard roles that are specific to the x86 architectures, `Role.index` and `Role.segment`. With roles we can drop the dependency on Target in most of the places where it makes sense (I still left it in x86 and other target-specific plugins, which obviously are independent on the newly added architectures).

ivg marked this pull request as draft October 2, 2020 21:02

ivg force-pushed the overhauls-targets-part-3 branch from 80ace08 to fb48bf6 Compare October 5, 2020 13:51

ivg force-pushed the overhauls-targets-part-3 branch from fb48bf6 to 7912729 Compare October 5, 2020 19:33

ivg marked this pull request as ready for review October 5, 2020 20:28

ivg changed the title ~~[WIP] overhauls the target/architecture abstraction (3/n)~~ overhauls the target/architecture abstraction (3/n) Oct 5, 2020

ivg merged commit 2faaa42 into BinaryAnalysisPlatform:master Oct 5, 2020

ivg mentioned this pull request Feb 22, 2021

overhauls target/value abstraction and introduces roles (4/n) #1275

Merged

ivg deleted the overhauls-targets-part-3 branch December 1, 2021 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

overhauls the target/architecture abstraction (3/n) #1227

overhauls the target/architecture abstraction (3/n) #1227

ivg commented Oct 2, 2020

ivg commented Oct 2, 2020

overhauls the target/architecture abstraction (3/n) #1227

overhauls the target/architecture abstraction (3/n) #1227

Conversation

ivg commented Oct 2, 2020

ivg commented Oct 2, 2020