Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
overhauls the target/architecture abstraction (1/n)
Introduces Theory.Target.t that superseeds Bap.Std.Arch.t. The old representation suffered from a few problems that we inherited from LLVM. The main issue is that Arch.t is not extensible and in order to add a new architecture the Bap.Std code shall be changed in a backward-compatibility-breaking manner. Arch.t is als unable to represent the whole variety of computing devices, which is especially relevant to micro-controllers (AVR, PIC) and IoT devices on which we are currently focusing. Finally, Arch.t is not precise enough to capture information that is necessary for code generation, the new venue that we are currently exploring. As the first attempt that didn't really work we introduced arch, sub, and other properties to the `core-theory:unit` class in BinaryAnalysisPlatform#1119. The problem with that approach was the stringly typed interface as `arch` was represented as a simple string. In addition, the proposed properties werent' able to describe uncommon architectures. Finally, it was very awkward to use, all fields were optional with no good defaults. This is the second attempt and it will be split into several pull requests. The first PR, this one, introduce the Theory.Target.t but still keeps Arch.t alive, i.e., it is used by all internal and external components of BAP. This is to ensure that switching to Target.t doesn't break any existing code. The consequent pull requests will gradually deprecated functions that use Arch.t and switch Target.t everywhere. The most important switch will affect the disassembler/decoder framework, which is currently still stuck on Arch.t. Just to be clear, after this work is finished and until BAP 3.0 and maybe even thereafter Arch.t will still work as it used to work and no code will break or require updates. However, newly added architectures, such as AVR or PIC, i.e., those that could not be represented with Arch.t will not be available for the code that still relies on it. In addition to Theory.Target.t we add a few more abstractions and convenience functions, e.g., `Project.empty` and a completely new interface for Project.Input.t generation, which makes it easier to create projects from strings or other custom data, e.g., `Project.Input.from_string` . We also add Source, Language, and Compiler abstractions to the knowledge base Core Theory. These abstractions, together with Target, describe the full cycle of the program transformation using the compiler from source code in the given language to the program for the specified target (and the other way around). The Target abstraction itself comes with a few more data types that describe various aspects of the target system, including file formats, ABI, floating-point ABI (FABI), endianness, which is no longer limited to the binary choice of little and big endianness, and an extensible data type for storing target-specific options. Finally, all targets are formed into hierarchies and families, which helps in controlling the vast zoo of computer architectures and devices. The Target.t is an abstract data type and is self-describing and includes enough information that describes all the details of the architecture. We also provide four library modules, for arm, mips, powerpc, and x86 that exposes the currenlty declared targets. Our LLVM backend is not yet precise enough to recongize many of the supported targets and we don't have analyses right now that will infer the target from the binary, but we will add the `--target` option in the next PRs (when we will switch to Target.t) everywhere. As usual, comments, questions, reviews are very welcome.
- Loading branch information