Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate variables based on annotations of asm statements #715

Closed
wants to merge 4 commits into from

Conversation

bottbenj
Copy link

@bottbenj bottbenj commented May 1, 2022

Instead of simply ignoring asm statements use their annotations to invalidate variables that are potentially modified by them.

  • e3af268 are the changes required to make this compile after the changes in Add support for parsing asm goto cil#92 that are required for this analysis. This commit will have to be replace in the future after i get to properly creating the CFG for asm goto statements.
  • 92b89d5 introduces an event to invalidate variables and makes the two existing invalidates in base use it.
  • e3af268 the actual code to collect which variables to invalidate for an asm

TODO

@@ -675,7 +675,7 @@ struct
| Entry f -> tf_entry var edge prev_node f
| Ret (r,fd) -> tf_ret var edge prev_node r fd
| Test (p,b) -> tf_test var edge prev_node p b
| ASM (_, _, _) -> tf_asm var edge prev_node (* TODO: use ASM fields for something? *)
| ASM _ -> tf_asm var edge prev_node (* TODO: use ASM fields for something? *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant data should be taken from the ASM constructor and passed as actual arguments instead of looking at ctx.edge in the transfer function implementation.

@michael-schwarz
Copy link
Member

Sorry we haven't provided feedback yet, we have an upcoming paper deadline on Friday AOE and will do so after for all your PRs 😄

@sim642 sim642 self-requested a review May 8, 2022 07:17
@@ -42,7 +42,7 @@ let pretty () = function
| Entry (f) -> Pretty.text "(body)"
| Ret (Some e,f) -> Pretty.dprintf "return %a" dn_exp e
| Ret (None,f) -> Pretty.dprintf "return"
| ASM (_,_,_) -> Pretty.text "ASM ..."
| ASM _ -> Pretty.text "ASM ..."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we actually do something better than ignoring asm, it might be a good idea to also have the edge printout reflect the actual data.


asm("nop");

assert(x == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is named "discard-local", but in what sense is the local being discarded if it isn't invalidated here.

: "=r"(x));
}

assert(x == 1); // UNKNOWN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the UNKNOWN in previous tests, which are simply because we don't analyze the asm instructions and assume the worst about nop, if I understand this correctly, it's actually unknowable, not just due to our imprecision. Therefore:

Suggested change
assert(x == 1); // UNKNOWN
assert(x == 1); // UNKNOWN!

If that's the case, then the same applies to following tests as well.

@@ -9,6 +9,7 @@ type t =
| AssignSpawnedThread of lval * ThreadIdDomain.Thread.t (** Assign spawned thread's ID to lval. *)
| Access of {var_opt: CilType.Varinfo.t option; write: bool} (** Access varinfo (unknown if None). *)
| Assign of {lval: CilType.Lval.t; exp: CilType.Exp.t} (** Used to simulate old [ctx.assign]. *)
| Invalidate of {addr_list: PreValueDomain.AD.bucket BatMap.Int.t list}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is the actual type without inlining the implementation detail about buckets:

Suggested change
| Invalidate of {addr_list: PreValueDomain.AD.bucket BatMap.Int.t list}
| Invalidate of {addr_list: PreValueDomain.AD.t list}

Comment on lines +1117 to +1122
"enabled": {
"title": "sem.asm.enabled",
"description": "",
"type": "boolean",
"default": true
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the use case for ever disabling asm semantics (and whatever that would mean)?
Wouldn't enabling the other options, which make additional safety assumptions about asm statements already give the old behavior, where possible changes from asm statements were completely ignored?

Comment on lines +1123 to +1134
"basic-preserve-globals": {
"title": "sem.asm.basic-preserve-globals",
"description": "If enabled, do preserve globals across basic asm statements",
"type": "boolean",
"default": false
},
"memory-preserve-globals": {
"title": "sem.asm.memory-preserve-globals",
"description": "If enabled, do preserve globals across advanced asm statements with memory clobber annotation",
"type": "boolean",
"default": false
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, it might make sense to negate these options since the pre-existing sem.unknown_function.invalidate.globals is about invalidating as opposed to preserving.

M.tracel "invalidate" "Setting addresses [%a] to values [%a]\n" (d_list ", " AD.pretty) addrs (d_list ", " VD.pretty) vs
);
set_many ?ctx ask gs st invalids'
invalidate_core ~ctx false (List.map (fun exp -> (exp, InvalidateTransitive)) exps);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inv_globs argument for invalidate_core should probably be named, otherwise it's confusing what this false means here.

Comment on lines +2440 to +2453
| Events.Invalidate {addr_list} ->
let invalidate_address a =
let t = AD.get_type a in
let v = get ask ctx.global st a None in (* None here is ok, just causes us to be a bit less precise *)
let nv = VD.invalidate_value ask t v in
(a, t, nv)
in
let invalids = List.map invalidate_address addr_list in
if M.tracing && addr_list <> [] then (
let addrs = List.map (Tuple3.first) invalids in
let vs = List.map (Tuple3.third) invalids in
M.tracel "invalidate" "Setting addresses [%a] to values [%a]\n" (d_list ", " AD.pretty) addrs (d_list ", " VD.pretty) vs
);
set_many ~ctx ask ctx.global ctx.local invalids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for converting this into an event instead of simply extracting the functions for all globals invalidation etc inside the base analysis itself?
All the calls to invalidate_core look awkward, because they have to delegate outside to the events system to perform these after the actual transfer function completes, but in all three cases the transfer function just returns ctx.local as the updated state. This seems like an unnecessarily roundabout way to extract common logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for converting this into an event instead of simply extracting the functions for all globals invalidation etc inside the base analysis itself?

The idea of making invalidation an event based mechanism was that other analyses could be informed and then do some proper handling. One could e.g. imagine that an relational analysis that handles referenced variables might need to know when some variables are potentially invalidated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just afraid that in light of #719 with the need to distinguish non-transitive vs transitive accesses also for reads/frees and writes other than invalidation, this won't be general enough.

It's also quite base analysis specific by using its ValueDomain.Addr. Other analyses like apron or var_eq, which currently also do similar invalidation, aren't using base analysis's address abstraction, but have their own ways of working with invalidated lvalues/rvalues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just afraid that in light of #719 with the need to distinguish non-transitive vs transitive accesses also for reads/frees and writes other than invalidation, this won't be general enough.

The idea for invalidate events would be that the Base analysis collects all lvals that have to be invalidated. This way, other analyses would not have to implement the collection of transitively reachable lvals.

It's also quite base analysis specific by using its ValueDomain.Addr

One could add a utility function that takes elements from the ValueDomain.Addr and translates them back to the lval they are on (without offset information, in case the offset is not concrete).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for invalidate events would be that the Base analysis collects all lvals that have to be invalidated. This way, other analyses would not have to implement the collection of transitively reachable lvals.

They don't currently reimplement it either, but just use the ReachableFrom query where needed. Since #590 those queries are cached, so multiple analyses asking for the same reachability comes at no extra cost.

We will probably have to see how this plays together with the general non-transitive vs transitive accessing changes from #719 and #720.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and how it compares to #720 and the Access event. It might make sense to have the Invalidate event after all because:

  1. Access event is emitted for all writes, not just invalidating ones. Although this could be changed by adding an extra field to that event.
  2. Access event is emitted only during multithreaded mode, because it's all about race detection. But these invalidations also need to happen in singlethreaded mode. Although this could also be changed (and accesses from singlethreaded mode wouldn't race with anything).

But given that access analysis (and its event) is really about race detection, it might be better to keep it that way.

@@ -1899,34 +1899,34 @@ struct
in
List.concat_map do_exp exps

let invalidate ?ctx ask (gs:glob_fun) (st:store) (exps: exp list): store =
type invalidate_mode = InvalidateSelf | InvalidateTransitive | InvalidateSelfAndTransitive [@@deriving show]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a note to us than you, but the notion of transitive vs non-transitive invalidation gets a lot more general in light of #719 because it doesn't apply to just invalidation but every form of access.

Comment on lines +1906 to +1907
let lvals_self = List.filter_map (function (_, InvalidateTransitive) -> None | (Lval lval, _) -> Some lval | _ -> None) inv_exps in
let exps_self = List.filter_map (function (_, InvalidateTransitive) | (Lval _, _) -> None | (exp, _) -> Some exp) inv_exps in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the special Lval cases here in addition to the invalidate_mode based ones? Because matching the outermost lval expression is kind of fragile: it might also be a cast expression of an lval (or any number of casts). Seems like there's a more general concept here at play.

@michael-schwarz michael-schwarz requested a review from jerhard May 10, 2022 13:18
{
asm("nop");

assert(x == 0); // UNKNOWN
Copy link
Member

@jerhard jerhard Jun 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to have additional tests for the configurable options you implement, e.g. you might add a test for what's currently sem.asm.basic-preserve-global.

Comment on lines +30 to +31
asm_volatile_goto("1:"
".byte " __stringify(BYTES_NOP5) "\n\t" JUMP_TABLE_ENTRY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section might benefit from better indentation.

@@ -0,0 +1,45 @@
// see https://github.com/goblint/cil/issues/81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test currently only about parsing and not crashing Goblint? Maybe add an explicit comment about that, and the current limitations.

@michael-schwarz michael-schwarz self-assigned this Mar 24, 2023
@michael-schwarz michael-schwarz removed their assignment Jul 14, 2023
@michael-schwarz
Copy link
Member

Unassigning myself as I currently do not have the bandwith to invest time here.

@michael-schwarz
Copy link
Member

Thank you again for this PR. One of the student teams is now working on implementing this more generally for the Goblint practical. Therefore, I am closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants