Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overhauls the target/architecture abstraction (3/n) #1227

Merged
merged 1 commit into from
Oct 5, 2020

Conversation

ivg
Copy link
Member

@ivg ivg commented Oct 2, 2020

In this episode, we liberate bap mc and bap objdump from the bonds
of the Arch.t representation. We also add the systemz lifter for
demonstration purposes. Of course, the lifter is minimal and far from
being usable, but that serves well its didactic purposes.

The interface of the bap mc command is preserved but is extended
with a few more command-line options that provide a great deal of
flexibility. Not only it is now possible to specify the target and
encoding, but it is now possible to pass options directly to the
backend, which is useful for disassembling targets that are not yet
known to BAP. Below is an excerpt from the bap-mc man page
(see bap mc --help)

       SETTING ARCHITECHTURE

       The target architecture is controlled by several groups of options that
       can not be used together:

       - arch;
       - target and encoding;
       - triple, backend, cpu, bits, and order.

       The arch option provides the least control but is easiest to use. It
       relies on the dependency-injection mechanism and lets the target
       support packages (plugins that implement support for the given
       architecture) do their best to guess the target and encoding that
       matches the provided name. Use the common names for the architecture
       and it should work. You can use the bits and order options to give more
       hints to the target support packages. They default to 32 and little
       correspondingly.

       The target and encoding provides precise control over the selection of
       the target and the encoding that is used to represent machine
       instructions. The encoding field can be omitted and will be deduced
       from the target. Use  bap list targets and  bap list encodings to get
       the list of supported targets and encodings respectivly.

       Finally, the triple, backend, cpu,... group of options provides the
       full control over the disassembler backend and bypasses the
       dependency-injection mechanism to pass the specified options directly
       to the corresponding backends. This enables disassembling of targets
       and encodings that are not yet supported by BAP. The meanings of the
       options totally depend on the selected backend and they are passed as
       is to the corresponding arguments of the Disasm_expert.Basic.create
       function. The bits and order defaults to 32 and little corresondingly
       and are used to specify the number of bits in the target's addresses
       and the order of bytes in the word. This group of options is useful
       during the implementation and debugging of new targets and thus is
       reserved for experts. Note, when this group is used the semantics of
       the instructions will not be provided as it commonly requires the
       target specification.

@ivg ivg marked this pull request as draft October 2, 2020 21:02
@ivg
Copy link
Member Author

ivg commented Oct 2, 2020

As a small teaser, I will finish it on Monday (need a bit more documentation and polishing, probably will add a couple more instruction to the systemz lifter).

@ivg ivg force-pushed the overhauls-targets-part-3 branch from 80ace08 to fb48bf6 Compare October 5, 2020 13:51
In this episode, we liberate `bap mc` and `bap objdump` from the bonds
of the `Arch.t` representation. We also add the systemz lifter for
demonstration purposes. Of course, the lifter is minimal and far from
being usable, but that serves well its didactic purposes.

The interface of the `bap mc` command is preserved but is extended
with a few more command-line options that provide a great deal of
flexibility. Not only it is now possible to specify the target and
encoding, but it is now possible to pass options directly to the
backend, which is useful for disassembling targets that are not yet
known to BAP. Below is an excerpt from the bap-mc man page
(see bap mc --help)

```
       SETTING ARCHITECHTURE

       The target architecture is controlled by several groups of options that
       can not be used together:

       - arch;
       - target and encoding;
       - triple, backend, cpu, bits, and order.

       The arch option provides the least control but is easiest to use. It
       relies on the dependency-injection mechanism and lets the target
       support packages (plugins that implement support for the given
       architecture) do their best to guess the target and encoding that
       matches the provided name. Use the common names for the architecture
       and it should work. You can use the bits and order options to give more
       hints to the target support packages. They default to 32 and little
       correspondingly.

       The target and encoding provides precise control over the selection of
       the target and the encoding that is used to represent machine
       instructions. The encoding field can be omitted and will be deduced
       from the target. Use  bap list targets and  bap list encodings to get
       the list of supported targets and encodings respectivly.

       Finally, the triple, backend, cpu,... group of options provides the
       full control over the disassembler backend and bypasses the
       dependency-injection mechanism to pass the specified options directly
       to the corresponding backends. This enables disassembling of targets
       and encodings that are not yet supported by BAP. The meanings of the
       options totally depend on the selected backend and they are passed as
       is to the corresponding arguments of the Disasm_expert.Basic.create
       function. The bits and order defaults to 32 and little corresondingly
       and are used to specify the number of bits in the target's addresses
       and the order of bytes in the word. This group of options is useful
       during the implementation and debugging of new targets and thus is
       reserved for experts. Note, when this group is used the semantics of
       the instructions will not be provided as it commonly requires the
       target specification.
```
@ivg ivg force-pushed the overhauls-targets-part-3 branch from fb48bf6 to 7912729 Compare October 5, 2020 19:33
@ivg ivg marked this pull request as ready for review October 5, 2020 20:28
@ivg ivg changed the title [WIP] overhauls the target/architecture abstraction (3/n) overhauls the target/architecture abstraction (3/n) Oct 5, 2020
@ivg ivg merged commit 2faaa42 into BinaryAnalysisPlatform:master Oct 5, 2020
ivg added a commit to ivg/bap that referenced this pull request Feb 22, 2021
This PR is the continuation of the BinaryAnalysisPlatform#1225, BinaryAnalysisPlatform#1226, and BinaryAnalysisPlatform#1227 series of
changes that were focused on substituting the old and inextensible
`Arch.t` abstraction with the new `Theory.Target.t` representation.

This episode is instigated by the upcoming implementation of the
RISCV target. Since RISCV is the out target that is not supported with
Arch.t it became a good test of the new Theory.Target.t abstraction.

As the RISCV worked showed, we still have lots of code that depends on
Arch.t, most importantly Primus, which was fully dependent on
Arch.t. The main issue was that Theory.Target.t doesn't provide any
means to encode register classes, which prevented us from using it
everywhere in Primus, e.g., we need to know which register is the
stack pointer in order to setup the stack.

To implement this, we introduce a new abstraction called _role_. A
_role_ could be generally applied to any entity but so far we are only
talking about the roles of registers in various targets. The target
definiton now acccepts the `regs` paramater that takes the register
file specification with each register assigned one or more roles,
e.g., here is the register file specification for 8086,

```ocaml
Theory.Role.Register.[
 [general; integer], main @< index @< segment;
 [stack_pointer], untyped [reg r16 "SP"];
 [frame_pointer], untyped [reg r16 "BP"];
 [Role.index], untyped index;
 [Role.segment], untyped segment;
 [status], untyped flags;
 [integer], untyped [
   reg bool "CF";
   reg bool "PF";
   reg bool "AF";
   reg bool "ZF";
   reg bool "SF";
   reg bool "OF";
]
```

I.e., we assign a set of roles to a set of registers. We also now have
two new functions `Theory.Target.regs` and `Theory.Target.reg` that
enable querying the register file of the target for register that
fulfill one or more roles. Whilst we publish a limited number of
well-known (blessed) roles in the `Theory.Role.Register` module, more
roles could be added as user need it. For example, in the code snippet
above we have two non-standard roles that are specific to the x86
architectures, `Role.index` and `Role.segment`.

With roles we can drop the dependency on Target in most of the places
where it makes sense (I still left it in x86 and other target-specific
plugins, which obviously are independent on the newly added
architectures).
ivg added a commit that referenced this pull request Feb 22, 2021
This PR is the continuation of the #1225, #1226, and #1227 series of
changes that were focused on substituting the old and inextensible
`Arch.t` abstraction with the new `Theory.Target.t` representation.

This episode is instigated by the upcoming implementation of the
RISCV target. Since RISCV is the out target that is not supported with
Arch.t it became a good test of the new Theory.Target.t abstraction.

As the RISCV worked showed, we still have lots of code that depends on
Arch.t, most importantly Primus, which was fully dependent on
Arch.t. The main issue was that Theory.Target.t doesn't provide any
means to encode register classes, which prevented us from using it
everywhere in Primus, e.g., we need to know which register is the
stack pointer in order to setup the stack.

To implement this, we introduce a new abstraction called _role_. A
_role_ could be generally applied to any entity but so far we are only
talking about the roles of registers in various targets. The target
definiton now acccepts the `regs` paramater that takes the register
file specification with each register assigned one or more roles,
e.g., here is the register file specification for 8086,

```ocaml
Theory.Role.Register.[
 [general; integer], main @< index @< segment;
 [stack_pointer], untyped [reg r16 "SP"];
 [frame_pointer], untyped [reg r16 "BP"];
 [Role.index], untyped index;
 [Role.segment], untyped segment;
 [status], untyped flags;
 [integer], untyped [
   reg bool "CF";
   reg bool "PF";
   reg bool "AF";
   reg bool "ZF";
   reg bool "SF";
   reg bool "OF";
]
```

I.e., we assign a set of roles to a set of registers. We also now have
two new functions `Theory.Target.regs` and `Theory.Target.reg` that
enable querying the register file of the target for register that
fulfill one or more roles. Whilst we publish a limited number of
well-known (blessed) roles in the `Theory.Role.Register` module, more
roles could be added as user need it. For example, in the code snippet
above we have two non-standard roles that are specific to the x86
architectures, `Role.index` and `Role.segment`.

With roles we can drop the dependency on Target in most of the places
where it makes sense (I still left it in x86 and other target-specific
plugins, which obviously are independent on the newly added
architectures).
@ivg ivg deleted the overhauls-targets-part-3 branch December 1, 2021 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant