-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implements deadcode-elimination pass optimization #857
implements deadcode-elimination pass optimization #857
Conversation
This PR intends to speed up the dead-code-elimination plugin. At first, it adds cache, so every next launch will use a result of a first one. At second, and this need to be discussed IRL, this reduces a number of SSA form calculation, that subsequently speed up the whole plugin
Term.map blk_t sub ~f:(live_def dead), | ||
Term.map blk_t ssa ~f:(fun b -> live_def dead b |> live_phi dead) | ||
|
||
let rec run subs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translate the run
to a function, that takes a list of functions in the SSA form and the initial set of dead terms, and returns the final (maximal) set of dead terms. E.g.,
let rec run subs dead =
let free = ... in
let dead' = .. in
if Set.length dead' > Set.length dead
then run (clean dead' subs) dead'
else dead
if Set.is_empty dead then subs | ||
else run (Seq.map subs ~f:(clean dead)) | ||
|
||
let process prog = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obtain a set of dead terms and use Term.map
to clean dead terms from each subroutine.
let run proj = | ||
let digest = digest proj in | ||
let p = | ||
match Program.Cache.load digest with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never store the program (or project) in the cache, only the results of the computation. And then reapply them to the program. Otherwise, your pass will delete results from the upstream passes, which are otherwise transparent to you (e.g., their attributes)
a25a5a2
to
075ad6e
Compare
open Format | ||
include Self() | ||
|
||
let union ~init ~f = | ||
Seq.fold ~init ~f:(fun acc x -> | ||
List.fold ~init ~f:(fun acc x -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, there is Set.union_list
(we needed this union
helper just because there is no union_sequence
function in core.
@@ -24,8 +25,9 @@ let computed_def_use sub = | |||
def_use_collector#visit_sub sub (Var.Set.empty,Var.Set.empty) | |||
|
|||
let sub_args sub = | |||
Term.enum arg_t sub |> | |||
Term.enum arg_t sub |> Seq.to_list |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no need to transform it to list
. I was asking only to use a list instead of a memoized sequence for the list of subroutines. Not like use list everywhere.
let process prog = | ||
let subs = Term.enum sub_t prog |> | ||
Seq.map ~f:Sub.ssa |> | ||
Seq.to_list in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Set.to_list_rev
as the order doesn't matter.
|
||
let run proj = | ||
let digest = digest proj in | ||
let prog = Term.map sub_t (Project.program proj) ~f:propagate_consts in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you do propagate_consts
before you load something from a cache?
let live_def dead blk = Term.filter def_t ~f:(is_alive dead) blk | ||
let live_phi dead blk = Term.filter phi_t ~f:(is_alive dead) blk | ||
|
||
let clean dead ssa = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the clean procedure was doing propagate_consts
before, which actually affects the dead code elimination procedure, for example:
x := 12
y := x + 1
after constant propagation became
x := 12
y := 12 + 1
That makes x
dead, so it could be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks broken. You don't do propagate_consts on each step that prevents dead code elimination from advancing.
It is strange that you're getting the same result with the new algorithm... either you're confused with caching or
not testing at all.
Also, note that your caching should also take into account the propagated constants, otherwise, after
you remove a variable whose value was propagated you will end up with undefined variables. Thus your cache should include the set of dead terms, as well as the set of substitutions, which should be word Var.Map.t Tid.Map.t
Sorry, indented the buffer, so a bit more effort could be required for review. changes: 1) consts propagation and dead code elimination separated one from another. Previously, the number of times we called `propagate_consts` was the same as number of times we did dead code elimination. Theoreticly speaking, these changes could kill more unused code, since we propagate consts until reaching of a fixpoint, and only then eliminate dead code. 2) caching takes in account results of consts propagation too 3) refactoring
Sorry, indented the buffer, so a bit more effort could be required changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, constant propagation and dead code elimination are not distributive, thus you can't apply first N propagations and then M eliminations and hope that it would give you the same result as if you will apply them one after another.
Next, the code became unwieldy because of too much information you need to uphold during the analysis. Now it is hard for me to prove to myself that it is correct.
So my suggestion is to simplify the algorithm at the cost of some efficiency. Instead of collecting changes in the state we can run the algorithm as it was before the changes (i.e., maps/filters), and after we reach the fixed point we can calculate the diff. The diff would be represented as a mapping
type change =
| Subst of exp
| Remove
type diff = change Tid.Map.t
where Subst exp
means that the right hand side of the definition at the specified tid must be substituted with the new definition (it easier and more efficient, than keep track of variable substitutions);
and Remove
means that the corresponding definition should be removed.
Seq.fold ~init ~f:(fun acc x -> | ||
Set.union acc (f x)) | ||
let union ~init ~f xs = | ||
init :: List.rev_map ~f xs |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no need to create an intermediate list structure here, just do
let union_fold ~f = List.fold ~f:(fun xs x -> Set.union xs (f x))
~init:([], consts, ready) ~f:(fun (subs, consts, ready) sub -> | ||
let sub', vars = propagate_consts sub in | ||
let consts = Consts.update consts (Term.tid sub) vars in | ||
if Sub.equal sub sub' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison of two subs is a very heavy operation, as you deeply compare each term, and each exp. Since the const propagation will always remove a term that it propagated you can just compare the total number of terms in a subroutine.
This PR brings back interleaving of constant propagation and dead code elimination. Also optimizes constant propagation in part of computing set of virtual variables for propagation: it's done only once now.
2fe13c9
to
5fd6436
Compare
Updated. Spent some time on refactoring and hope it looks more readable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add constant folding and address the following concern.
Your analysis is performed in the SSA form, but the result is in a non-SSA form. Since you capture variables in your substitutions I'm afraid that some of them might have non-zero indices. To remove the burden of proving this (and maintaining the set of variants that corroborate the correctness of this proof) I would suggest to simply assume that it could be and cope with this possibility by normalizing expressions to the non-SSA form right before you store/load/apply them. The normalization is simple, just forget the indices by translating each variable to its base version.
*) | ||
*) | ||
|
||
let defined_once sub = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cool, it looks like that the original implementation that had only one set actually contained a bug, as it was collecting variables that actually could be defined more than once.
Besides, since we already in SSA here, we can extend the analysis to propagate all virtual variables, not only those that are defined once. (We still should refrain ourselves from touching true registers, as the might be touched by function calls, interrupts, and other instructions that we don't model quite well).
body = | ||
Term.map blk_t sub.body ~f:(Blk.map_exp ~f:(fun exp -> | ||
Map.fold vars ~init:exp ~f:(fun ~key:pat ~data:rep -> | ||
Exp.substitute Bil.(var pat) rep))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shall add constant folding after the substitution, this will help us to optimize the following code:
t1 := 1
t2 := 2
t3 := t1 + t2
R0 := t3
to
R0 := 3
|
||
module Dead_code_data = struct | ||
include Regular.Make(struct | ||
type nonrec t = Tid.Set.t * (exp Var.Map.t) Tid.Map.t [@@deriving bin_io, sexp, compare] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no need for nonrec
here and a need for some comment (or more descriptive data type, such as the record) to describe what is actually stored, e.g.,
type t = {
dead : Tid.Set.t;
substs : subst Tid.Map.t;
} and subst = exp Var.Map.t
Also, instead of a substitution, I would suggest using the whole right-hand-side, as it will also capture the constant-folding and is easier to use.
We run const propagation in ssa form, so we don't need to track once defined variable, since they are obviously defined only once in ssa. Added a cmdline flag for propagating consts in real variables too, i.e. in registers.
Now, set of variables that are affectd by analysis could be defined through the comamnd line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dead_code_diff is a strong rejection, please rewrite it in the suggested manner (or any other manner that is faster, cleaner, and shorter than my proposal).
I didn't look at the other modules yet, will provide review on them soon.
@@ -0,0 +1,113 @@ | |||
open Core_kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we need to do it better. Currently we have the following problems:
- too complex - I want you to target to 60 lines of code or so
- ineffective - the implementation is incredibly hungry for memory and is doing too much unnecessary work
- dirty - you're using hashtables :/.
The right approach, is start from bottom to top. Define small and easy to understand functions,
e.g.,
let changes_in_def d1 d2 =
let different = phys_equal d1 d2 || Exp.(Def.rhs d1 <> Def.rhs d2) in
if different Tid.Map.singleton (Term.tid d2) (`NewRhs (Def.rhs d2)) else Tid.Map.empty
Then implement changes_in_jmps
which will return
[> `NewCond of exp | `NewDest of exp] Tid.Map.t
and
let diff_of_blk b1 b2 =
let (++) = Map.merge ~f:merge_changes in
let was = elt ~take:Blk.elts b1 and now = elts ~take:Blk.elts b2 in
Map.fold2 was now ~init:[] ~f:(fun ~key:_ ~data diffs -> match data with
| `Both (`Def d1, `Def d2) -> changes_in_defs d1 d2 ++ acc
| `Both (`Jmp b1, `Jmp b2) -> changes_in_jmp d1 d2 ++ acc
| `Left d1 -> kill d1 ++ acc
| _ -> acc)
where the elts
function creates a map of a term subterms, e.g.,
let elts ~take t =
Seq.fold ~init:Tid.Map.empty (take t) ~f:(fun elts key data -> Map.add elts ~key ~data)
Now use the same approach to compare two subroutines, i.e.,
let diff_of_subs s1 s2 =
let was = elts ~take:(Term.enum blk_t) s1 and now = elts ~take:(Term.enum blk_t) s2 in
Map.fold2 ...
and... you guess, the same on the program level.
module Diff = Dead_code_diff | ||
|
||
|
||
type level = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it just an integer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and define what an optimzation level means using predcates
let touch_physical_registers level = level > 2
let touch_flags level = level > 1
etc. We can later parametrize even more behavior, depending on our peculiarity
let level_equal x y = compare_level x y = 0 | ||
let mem levels x = List.mem levels ~equal:level_equal x | ||
|
||
let check_level is_flag levels var = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to
let is_optimization_allowed level var = ...
else (Set.add dead (Term.tid t)) | ||
method! enter_def t dead = | ||
let v = Def.lhs t in | ||
if Set.mem protected (Var.base v) || live v || not (checked v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the check for allowance should be faster then the protected lookup, thus move it to the front.
|
||
let live_def checked dead blk = | ||
Term.filter def_t blk ~f:(fun d -> | ||
if checked (Def.lhs d) then is_alive dead d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 you know what for
if checked (Def.lhs d) then is_alive dead d | ||
else true) | ||
|
||
let substitute sub vars = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will substitute expressions in phi-nodes (that we don't want to do) and will break let expressions (e.g., if we have [v -> 0]
in R0 := let v= 1 in v + v
then it will be substituted to R0 := 0
instead of the correct version R0 := 2
You can use the following function, and then either use Blk.map_exp ~skip:phi
or do not recurse into phi nodes, if you decide to stick to the term mapper.
let rec substitute vars exp =
let substituter = object
inherit Exp.mapper as super
method! map_let var ~exp ~body =
let exp = super#map_exp exp in
let body = substitute (Map.remove vars var) body in
Bil.let_ var ~exp ~body
method! map_var v = Map.find vars v with
| None -> Bil.var v
| Some e -> e
end in
substituter#map exp
end)#map_sub sub | ||
|
||
(* A simple constant propagation. Note, the input is required to be in SSA. *) | ||
let propagate_consts checked sub = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function does more work than needed, the term visitor will go through jmps and phi-nodes, maybe it is better to rewrite it as a fold over Term.enum def_t blk
let clean checked dead sub = | ||
Term.map blk_t sub ~f:(fun b -> live_def checked dead b |> live_phi dead) | ||
|
||
let return_from_ssa sub = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to return from ssa, just drop the optimized program after you compute the optimization data.
let union ~init ~f = List.fold ~init ~f:(fun xs x -> Set.union xs (f x)) | ||
|
||
let process arch prog levels = | ||
let checked = check_level (is_flag arch) levels in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked? by whom... please rename.
let process arch prog levels = | ||
let checked = check_level (is_flag arch) levels in | ||
let free = free_vars prog in | ||
let rec run subs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apply optimization on each subroutine independently (except the protected set which should be computed interprocedurally). Then store the optimization for each subroutine in the cache (do not forget to include the digest of the protected set into the digest of a subroutine).
In general always think about Google Chrome. The current implementation will require you to load in the memory and store in a list 3 google chromes if not more. While we can't even allow to store one.
Basically, try to implement you analysis in a such way that it keeps in memory as small data as possible, ideally, no more than one subroutine. Also try to drop unused data as soon as possible, i.e., once you have a subroutine ready push it back to the project, store the cache, and keep going with the next one).
Please, reimplement in a such way that at no point of time your algorithm requires to hold in memory more than one subroutine. It should be O(1) in terms of memory consumption wrt to the size of a program in number of subroutines. |
This PR reduces a memory usage by applying analysis to each subroutine separatly, so there are not multiply copies of the same subroutine in memory, like it was before.
end)#map_exp | ||
|
||
let changes_in_def d1 d2 = | ||
let different = not (phys_equal d1 d2) && Exp.(Def.rhs d1 <> Def.rhs d2) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shall drop the index before comparison otherwise they all be different.
|
||
let changes_in_def d1 d2 = | ||
let different = not (phys_equal d1 d2) && Exp.(Def.rhs d1 <> Def.rhs d2) in | ||
if different then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it retruns an option, you can just return an empty map.
let touch_physical_registers level = level > 2 | ||
let touch_flags level = level > 1 | ||
|
||
let is_optimization_allowed is_flag level var = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function doesn't make sense
Jmp.with_kind j kind | ||
| _ -> assert false in | ||
Term.map blk_t sub ~f:(fun b -> | ||
Term.filter_map def_t b ~f:apply_to_def |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you join this two applies and filter_map a block only once?
]; | ||
let level = | ||
let doc = | ||
"An integer that a level of optimization, i.e. what variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifies the optimization level. The higher the value the more aggressive (and less safe) optimizations are applied.
On level 1, we optimize only the synthetic code that was generated by the lifter. Since such code can't leave a scope of instruction it is not affected by the imprecision of a control flow graph. On level 2, we also move and optimize processor flags. This removes a significant amount of code and simplifies the program and is a fair compromise between safety and performance. (Since flags are rarely used non-locally). Finally, on level 3 we extend our analysis to all variables.
plugins/ida/bap_ida_service.ml
Outdated
@@ -95,7 +95,7 @@ let cleanup_minidump () = | |||
Unix.lockf lock Unix.F_LOCK 0; | |||
protect ~f:(fun () -> | |||
List.iter files ~f:Sys.remove) | |||
~finally:(fun () -> Unix.lockf lock Unix.F_ULOCK 0) | |||
~finally:(fun () -> Unix.lockf lock Unix.F_ULOCK 0; Unix.close lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move it from this PR.
else None | ||
|
||
let changes_in_jmp j1 j2 = | ||
let different = not (phys_equal j1 j2) && Jmp.(j1 <> j2) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not compare terms directly, this will involve the comparison of all attributes which could be very heavy. Compare only those parts that the algorithm could change.
121d1f8
to
1b8806b
Compare
Sub name was't taken in account, so digest could be the same for different (but similar) subroutines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now we're close.
lib/bap/bap.mli
Outdated
@@ -7240,6 +7240,13 @@ module Std : sig | |||
?skip:[`phi | `def | `jmp] list -> (** defaults to [[]] *) | |||
t -> f:(exp -> exp) -> t | |||
|
|||
(** [map_elt ?phi ?def ?jmp blk] applies specified functions to every | |||
corresponded subterm of [blk]. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applies provided functions to the terms of corresponding classes.
All functions default to the identity function.
```
plugins/cache/cache_main.ml
Outdated
@@ -103,7 +103,7 @@ module Index = struct | |||
remove_files index' index; | |||
Sexp.save_hum file (sexp_of_index index); | |||
data) | |||
~finally:(fun () -> Unix.lockf lock Unix.F_ULOCK 0) | |||
~finally:(fun () -> Unix.lockf lock Unix.F_ULOCK 0; Unix.close lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move it into a separate PR? (the one which optimizes caching)
plugins/ida/bap_ida_service.ml
Outdated
@@ -95,7 +95,7 @@ let cleanup_minidump () = | |||
Unix.lockf lock Unix.F_LOCK 0; | |||
protect ~f:(fun () -> | |||
List.iter files ~f:Sys.remove) | |||
~finally:(fun () -> Unix.lockf lock Unix.F_ULOCK 0) | |||
~finally:(fun () -> Unix.lockf lock Unix.F_ULOCK 0; Unix.close lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move it from this PR.
let can_touch_physicals level = level > 2 | ||
let can_touch_flags level = level > 1 | ||
|
||
let is_optimization_allowed is_flag level var = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, encode the following rules:
O0: can touch nothing (not terms are removed, no values are propagated, but some constant folding may occur)
O1: optimize virtual variables (remove, propagate, fold)
O2: includes O1, and optimize flags
O3: includes O2, and optimize all registers
The general rule - each consecutive optimization level includes optimizations from all previous levels.
Note, we will later add O4 (based on the constant folding plugin from bap-plugins), which will propagate constants through memory, we will have their also mutliple opportunities, so we can fastly get to Omany. So we need to keep this predicate straight and clear.
"; | ||
`S "DESCRIPTION"; | ||
|
||
`P "An autorun pass that conservatively removes dead code. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs an update
|
||
`S "ALGORITHM"; | ||
|
||
`P "To make analysis inter procedural, we first compute an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not only in the description, but in the code also, of course :)
doesn't remove variables that are stored in memory, only registers | ||
are considered"; | ||
|
||
`S "ALGORITHM"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm description is outdated. (It is also wrong... so my suggestions is to rewrite it from scratch without even reading whatever is there). Example,
Applies constant folding, dead code elimination, and constant propagation in a loop until the fixed point is reached. The algorithm is interprocedural, however it is not call graph sensitive, as instead of considering the call graph we use an over approximation that any function can call any other, thus any variables that occurs free in any function is considered to be non-constant. The algorithm is, however, flow sensitive on the control flow graph level and uses the SSA form to encode data facts. Since, it is not always safe to rely on the control flow integrity and CFG precision, by default we optimize only flags and virtual variables under a presumption that those two kind of data points rarely used non-locally. See the $(b,--optimization-level) parameter for the list of available optimization levels and their consequences.
Seq.fold (Term.enum cls t) ~init:Var.Set.empty | ||
~f:(fun acc x -> acc ++ f x) in | ||
let sub_free sub = Sub.free_vars sub |> Set.filter ~f:Var.is_physical in | ||
let sub_args sub = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, add the right hand side of the arg term to the set of free-vars. Just in case.
b601e74
to
11006f4
Compare
This PR implements the following optimizations of the
dead-code-elimination
plugin.