Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds mangling of duplicated subroutines #813

Merged
merged 10 commits into from
Apr 10, 2018

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Apr 4, 2018

fix #808

We track every addition of a subroutine to a program by storing names and tids in a table. Every addition replaces previous entries with the same name. When someone requests a result, it's easy to tell if a subroutine name needs in demangling: if a tid in the table for this name is differ from a subroutine's one, then other subroutine started to use this name, and we need to demangle current name. The same is true for subroutine aliases.

gitoleg added 2 commits April 4, 2018 14:10
Implemented like an iteration over existed subroutines
@gitoleg gitoleg requested a review from ivg April 4, 2018 20:30
Term.get_attr s Bap_attributes.address

let mangle_name addr tid name =
let mname = match addr with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an above zero chance that we will have intersections between the set of text representations of addresses and the set of text representation of tids, especially if we are working with objects, where the address space of the code sections starts from zero, the same as tids.

That's why we need to disjoin them with some prefix notation, e.g., malloc@deadbeef and malloc%deadbeef.

let need_to_mangle s names =
let tid = Term.tid s in
let sub_names = Ir_sub.name s :: sub_aliases s in
List.for_all sub_names ~f:(fun n ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be exists not for_all. With for_all you will be satisfied even if first name is unique and the rest won't be checked.

| Some tid' -> not (Tid.equal tid tid'))

let fix_names names subs =
let may_mangle addr tid name =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just name it mangle

let sub = Term.set_attr sub Ir_sub.aliases als in
let sub = Ir_sub.with_name sub nam in
(i, sub) :: acc) in
List.iter ~f:(fun (i, s) -> Vec.set subs i s) subs'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like using vector here... it is abusing.

module Builder = struct
type t = tid option * sub term vector
type t = tid option * tid String.Table.t * sub term vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shall not do this on the builder interface.

@ivg ivg self-assigned this Apr 4, 2018
@ivg
Copy link
Member

ivg commented Apr 4, 2018

You shall enforce this invariant on the program data structure, so every time a value of type program is created you need to check it (non-invasive) and if the invariant holds, then return the program unchanged otherwise rename the offending subroutines.

I would suggest you to make the program type private to ensure that it is not created anywhere without a check (that means that you need to hide it inside a module). Of course, this should all be done in the implementation, nothing should be changed in the interface.

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me cite myself from #808

Next, we need a function fix_names: ?old_names:string Tid.Map.t -> program -> program that will enforce the desired predicates. . When a conflict is detected, i.e., we have a name, that is already assigned an address, then we need to rename one of the two names. We need to give a preference to one or another, trying to preserve the new name and rename the old one. The name is old if it is in the old_names and has the same name. Otherwise it is new one. If both names are new, then pick the rename a name that corresponds to a tid that is smaller.

So, please, implement this priority clause. In the setter you have both the old program and the new program. If a term is renamed then the fresh name should be chosen. In other words there should be the following invariant, that if you update a program with a subroutine s with name n then it will contain the subroutine s with name n (i.e., only other subroutines will be renamed).

forall p:program, s:sub, n:name, x:string, (p[s][n] <- x)[s][n] = x

Hashtbl.change names name ~f:(function
| None -> Some tid
| Some tid' ->
if Tid.(tid' < tid) then Some tid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is called Some (Tid.max tid' tid)

Array.iter subs ~f:(fun s ->
let name = s.self.name in
let tid = s.tid in
Hashtbl.change names name ~f:(function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Hashtbl.update if you're not going to remove an element.

let names = String.Table.create () in
Array.iter subs ~f:(fun s ->
let name = s.self.name in
let tid = s.tid in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really necessary to introduce this let bindings? They just clobber the code, I think.

Hashtbl.change names name ~f:(function
| None -> Some tid
| Some tid' ->
if Tid.(tid' < tid) then Some tid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this policy prioritizes a subroutine with a smaller tid, though it doesn't give the priority to the last chosen name.

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, rewrite your implementation without relying on hashtables and tid ordering. Try to write it in a functional style. The underlying task is expressible without doing all this adding/removing and mutating lots of state.

let names_of_subs subs =
let names = String.Table.create () in
Array.iter subs ~f:(fun s ->
Hashtbl.change names s.self.name ~f:(function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is the add_multi function for this.

let self = {s.self with name} in
{s with self}

let names_of_subs subs =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using hashtables? We prefer to use pure functional data types, i.e., map, and I so far do not see any justifications to use mutable tables here.

Copy link
Contributor Author

@gitoleg gitoleg Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Justification is in elements order in an array of subroutines. If I use map, I would
have to revert a result of Array.fold , but definitely not to use Array.map


let fix_names ?(old_names=String.Table.create ()) subs =
let names = names_of_subs subs in
let conflicts = Hashtbl.count names ~f:(fun x -> List.length x > 1) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun x -> List.length x > 1 is in general very inefficient pattern. You do not need to know the full length of the list, to check that it is greater than 1, just do match xs with | [] | [_] -> true | _ -> false and you will go from linear to O(1) algorithm.

let conflicts = Hashtbl.count names ~f:(fun x -> List.length x > 1) in
if conflicts = 0 then subs
else
let remove name tid = match Hashtbl.find names name with
Copy link
Member

@ivg ivg Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a bad style to put a nested function in the else branch. Are those function has to defined only if the precondition holds? Please, move them out the function and define at the top-level. If you're concerned with a namespace pollution, then put them in a module.

let conflicts = Hashtbl.count names ~f:(fun x -> List.length x > 1) in
if conflicts = 0 then subs
else
let remove name tid = match Hashtbl.find names name with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using find/set you can use change. Though, this code already is too bad to be discussed at all...

| tids ->
if is_old sub then mangle sub
else
let max_tid = Option.value_exn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not be any dependence on the tid ordering. The order relation between two tids has nothing to do with a corresponding term freshness or anything. We may switch later to uuid as tids and assign them randomly. So please, remove any dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me quote you:

If both names are new, then pick the rename a name that corresponds to a tid that is smaller.

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change program if not needed.

if is_new sub.tid sub.self.name then
keep_name tids sub.self.name sub.tid
else tids) in
Array.map news ~f:(fun sub ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.map will create a new array and copy the contents of the old array even if the f function is ident:

let xs = Array.init 8 ~f:ident;;
val xs : int Core_kernel.Array.t = [|0; 1; 2; 3; 4; 5; 6; 7|]
# let ys = Array.map xs ~f:ident;;
val ys : int Core_kernel.Array.t = [|0; 1; 2; 3; 4; 5; 6; 7|]
# phys_equal xs ys;;
- : bool = false

In our case, it will result in double copying of the whole program even if the change was benign and didn't break the naming injectivity.

So you need to prefix the map function with the check if Array.length news <> Map.length tids then Array.map ... else news

@ivg ivg merged commit 6dbaa2d into BinaryAnalysisPlatform:master Apr 10, 2018
@gitoleg gitoleg deleted the sub-mangling branch April 17, 2018 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Term naming is non-injective
2 participants