Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preserves brancher information in the BIL code of an instruction #914

Merged
merged 12 commits into from
Jan 30, 2019

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Jan 24, 2019

Introduction

During the program reconstruction we rely on several sources of information, such as lifter and brancher. However, the information that we obtained from the brancher is not stored that leads to a mismatch later. This PR preserves the information provided by the brancher in the BIL code of an instruction. Ideally, the brancher should operate on the BIL level and fix the BIL itself, but we have what we have.

Design

We have two sources of information, D the set of destinations from the lifter and D' the set of destinations form the brancher, which are equally credible for us, but may provide conflicting information. Our task is to generate the BIL code that will have a set of destinations R which preserves the existing destinations (completeness) and do not introduce any false destinations (soundness), or formally R = D + D'. This requirement is not very strong, as it doesn't quantify over branch predicates, so this transformation might introduce extra (and sometimes unnecessary) non-determinism. However, we're trying to limit it as much as possible without making the code to complicate, in particular, we're ensuring that if D' is a subset of D (i.e., there is no new information from the brancher) then we don't touch the code.

Implementation

If we have conflicting information between the brancher and the lifter, we resolve it using a conditional branch predicated with the unknown expression,

if (unknown) jmp d1;
if (unknown) jmp d2;
...
if (unknown) jmp dm;
jmp d;

If there is no conflict, for example, when lifter tells us to jump to an indirect destination, while brancher gives some concrete destinations [d1,d2,...,dm], we spill the following code:

  if (RAX = d1) jmp d1;
  if (RAX = d2) jmp d2;
  ...
  if (RAX = dm) jmp dm;
  jmp RAX

References

Resolves: #899
Previous attempt: #911

@gitoleg gitoleg changed the title stores information from a brancher in a BIL code stores information from a brancher in BIL code Jan 24, 2019
match stage1.lift mem ins with
| Ok bil ->
let bil =
if has_jump bil then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be required for an instruction to have a jump, so we should remove this check and ensure that the BIL code always contains the sum of the destinations returned by the lifter and by the brancher.

@gitoleg gitoleg force-pushed the switch-income-edges branch from 2760b17 to 26f5a01 Compare January 24, 2019 18:37
@gitoleg gitoleg force-pushed the switch-income-edges branch from ce6e6c2 to cfb49e4 Compare January 24, 2019 22:22
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work, but can you, please, simplify the code generated in the non-deterministic case?

@@ -299,6 +299,46 @@ let create_indexes (dests : dests Addr.Table.t) =

let filter_valid s = {s with inits = Set.inter s.inits s.valid}

let join_destinations dests =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can stick to a more simple spraying destination in case of the nondeterminism and adopt the following scheme:

if (unk) jmp d1;
...
if (unk) jmp dm;
jmp d;

where d is the original BIL destination and d1,dm is the set of destinations provided by the brancher (alternatively you can just pick an arbitrary d from the whole set, it doesn't really matter).

This code will produce graphs that are simple (ideally, it should map to a single block with multiple when guarded jumps), while your version requires some sophistication during graph generation and with naive approach will spill a graph that is quadratic with respect to the number of destinations (which in case if we will retarget VSA here, could be a very big number).

@ivg ivg changed the title stores information from a brancher in BIL code preserves brancher information in the BIL code of an instruction Jan 25, 2019
@ivg ivg self-assigned this Jan 25, 2019
@gitoleg gitoleg merged commit b4caff9 into BinaryAnalysisPlatform:master Jan 30, 2019
@gitoleg gitoleg deleted the switch-income-edges branch February 1, 2019 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incoming edges of switch blocks are missing in the cfg when using the ida brancher
2 participants