-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds mangling of duplicated subroutines #813
Conversation
Implemented like an iteration over existed subroutines
lib/bap_types/bap_ir.ml
Outdated
Term.get_attr s Bap_attributes.address | ||
|
||
let mangle_name addr tid name = | ||
let mname = match addr with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an above zero chance that we will have intersections between the set of text representations of addresses and the set of text representation of tids, especially if we are working with objects, where the address space of the code sections starts from zero, the same as tids.
That's why we need to disjoin them with some prefix notation, e.g., malloc@deadbeef
and malloc%deadbeef
.
lib/bap_types/bap_ir.ml
Outdated
let need_to_mangle s names = | ||
let tid = Term.tid s in | ||
let sub_names = Ir_sub.name s :: sub_aliases s in | ||
List.for_all sub_names ~f:(fun n -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be exists
not for_all. With for_all
you will be satisfied even if first name is unique and the rest won't be checked.
lib/bap_types/bap_ir.ml
Outdated
| Some tid' -> not (Tid.equal tid tid')) | ||
|
||
let fix_names names subs = | ||
let may_mangle addr tid name = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just name it mangle
lib/bap_types/bap_ir.ml
Outdated
let sub = Term.set_attr sub Ir_sub.aliases als in | ||
let sub = Ir_sub.with_name sub nam in | ||
(i, sub) :: acc) in | ||
List.iter ~f:(fun (i, s) -> Vec.set subs i s) subs' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like using vector here... it is abusing.
lib/bap_types/bap_ir.ml
Outdated
module Builder = struct | ||
type t = tid option * sub term vector | ||
type t = tid option * tid String.Table.t * sub term vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shall not do this on the builder interface.
You shall enforce this invariant on the program data structure, so every time a value of type program is created you need to check it (non-invasive) and if the invariant holds, then return the program unchanged otherwise rename the offending subroutines. I would suggest you to make the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me cite myself from #808
Next, we need a function fix_names: ?old_names:string Tid.Map.t -> program -> program that will enforce the desired predicates. . When a conflict is detected, i.e., we have a name, that is already assigned an address, then we need to rename one of the two names. We need to give a preference to one or another, trying to preserve the new name and rename the old one. The name is old if it is in the old_names and has the same name. Otherwise it is new one. If both names are new, then pick the rename a name that corresponds to a tid that is smaller.
So, please, implement this priority clause. In the setter you have both the old program and the new program. If a term is renamed then the fresh name should be chosen. In other words there should be the following invariant, that if you update a program with a subroutine s
with name n
then it will contain the subroutine s
with name n
(i.e., only other subroutines will be renamed).
forall p:program, s:sub, n:name, x:string, (p[s][n] <- x)[s][n] = x
lib/bap_types/bap_ir.ml
Outdated
Hashtbl.change names name ~f:(function | ||
| None -> Some tid | ||
| Some tid' -> | ||
if Tid.(tid' < tid) then Some tid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is called Some (Tid.max tid' tid)
lib/bap_types/bap_ir.ml
Outdated
Array.iter subs ~f:(fun s -> | ||
let name = s.self.name in | ||
let tid = s.tid in | ||
Hashtbl.change names name ~f:(function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Hashtbl.update
if you're not going to remove an element.
lib/bap_types/bap_ir.ml
Outdated
let names = String.Table.create () in | ||
Array.iter subs ~f:(fun s -> | ||
let name = s.self.name in | ||
let tid = s.tid in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really necessary to introduce this let bindings? They just clobber the code, I think.
lib/bap_types/bap_ir.ml
Outdated
Hashtbl.change names name ~f:(function | ||
| None -> Some tid | ||
| Some tid' -> | ||
if Tid.(tid' < tid) then Some tid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this policy prioritizes a subroutine with a smaller tid, though it doesn't give the priority to the last chosen name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, rewrite your implementation without relying on hashtables and tid ordering. Try to write it in a functional style. The underlying task is expressible without doing all this adding/removing and mutating lots of state.
lib/bap_types/bap_ir.ml
Outdated
let names_of_subs subs = | ||
let names = String.Table.create () in | ||
Array.iter subs ~f:(fun s -> | ||
Hashtbl.change names s.self.name ~f:(function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the add_multi
function for this.
lib/bap_types/bap_ir.ml
Outdated
let self = {s.self with name} in | ||
{s with self} | ||
|
||
let names_of_subs subs = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using hashtables? We prefer to use pure functional data types, i.e., map, and I so far do not see any justifications to use mutable tables here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Justification is in elements order in an array of subroutines. If I use map, I would
have to revert a result of Array.fold
, but definitely not to use Array.map
lib/bap_types/bap_ir.ml
Outdated
|
||
let fix_names ?(old_names=String.Table.create ()) subs = | ||
let names = names_of_subs subs in | ||
let conflicts = Hashtbl.count names ~f:(fun x -> List.length x > 1) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fun x -> List.length x > 1
is in general very inefficient pattern. You do not need to know the full length of the list, to check that it is greater than 1, just do match xs with | [] | [_] -> true | _ -> false
and you will go from linear to O(1) algorithm.
lib/bap_types/bap_ir.ml
Outdated
let conflicts = Hashtbl.count names ~f:(fun x -> List.length x > 1) in | ||
if conflicts = 0 then subs | ||
else | ||
let remove name tid = match Hashtbl.find names name with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a bad style to put a nested function in the else branch. Are those function has to defined only if the precondition holds? Please, move them out the function and define at the top-level. If you're concerned with a namespace pollution, then put them in a module.
lib/bap_types/bap_ir.ml
Outdated
let conflicts = Hashtbl.count names ~f:(fun x -> List.length x > 1) in | ||
if conflicts = 0 then subs | ||
else | ||
let remove name tid = match Hashtbl.find names name with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using find
/set
you can use change
. Though, this code already is too bad to be discussed at all...
lib/bap_types/bap_ir.ml
Outdated
| tids -> | ||
if is_old sub then mangle sub | ||
else | ||
let max_tid = Option.value_exn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be any dependence on the tid ordering. The order relation between two tids has nothing to do with a corresponding term freshness or anything. We may switch later to uuid as tids and assign them randomly. So please, remove any dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me quote you:
If both names are new, then pick the rename a name that corresponds to a tid that is smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change program if not needed.
lib/bap_types/bap_ir.ml
Outdated
if is_new sub.tid sub.self.name then | ||
keep_name tids sub.self.name sub.tid | ||
else tids) in | ||
Array.map news ~f:(fun sub -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array.map
will create a new array and copy the contents of the old array even if the f
function is ident
:
let xs = Array.init 8 ~f:ident;;
val xs : int Core_kernel.Array.t = [|0; 1; 2; 3; 4; 5; 6; 7|]
# let ys = Array.map xs ~f:ident;;
val ys : int Core_kernel.Array.t = [|0; 1; 2; 3; 4; 5; 6; 7|]
# phys_equal xs ys;;
- : bool = false
In our case, it will result in double copying of the whole program even if the change was benign and didn't break the naming injectivity.
So you need to prefix the map function with the check if Array.length news <> Map.length tids then Array.map ... else news
fix #808
We track every addition of a subroutine to a program by storing names and tids in a table. Every addition replaces previous entries with the same name. When someone requests a result, it's easy to tell if a subroutine name needs in demangling: if a
tid
in the table for this name is differ from a subroutine's one, then other subroutine started to use this name, and we need to demangle current name. The same is true for subroutine aliases.