Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle renaming of local variables in incremental analysis (AST) #731

Merged
merged 40 commits into from
Jul 11, 2022

Conversation

TimOrtel
Copy link
Contributor

@TimOrtel TimOrtel commented May 9, 2022

This pull requests implements two things for incremental analysis using AST:

  1. Detection of renamed local variables in functions in incremental analysis.
  2. Unified output of varinfos. Varinfo names that have been changed are output with their changed names rather than their original names.

How it works:
Detection of renamed local variables: In compareAST.ml an additional parameter rename_mapping has been added to all functions comparing (parts of) AST nodes. This rename mapping holds assumptions about renamed local variables and is carried through all calls that compare the ASTs. If the assumptions of rename_mapping are never broken and apart from names being modified nothing has changed, the function is guaranteed to not being changed.
The assumptions of rename_mapping are not broken, if all occurrences of variables in the old AST match the occurrences in the new AST, and all variables that have been renamed have the new name in all occurrences in the new AST.

The assumptions of rename_mapping are constructed in compareCil.ml/eqF by looking at the locals before any comparing is performed.

Why is rename_mapping a function attribute and not a global var?
It may be beneficial when rename detection of globals (functions, global vars) is added.

Unified output of varinfos:
The name of the old varinfo is now replaced by the name of the new varinfo when a local variable was renamed during an incremental run.

Additional note: Currently, in renameMapping.ml dn_obj is copied from Cil.dn_obj as Cil does not export dn_obj. This has to be changed in a pull request for Cil.

CFG based comparisons are currently not supported, meaning that the behavior for CFG has not changed.

Please comment on this pull request if you need more information and if you have feedback for me. I will try to work it in as soon as possible.

@sim642 sim642 added feature performance Analysis time, memory usage labels May 9, 2022
Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm extremely on the fence about this RenameMapping and how it inevitably has to spread everywhere. As you might see from my browsing of the changes, there's an unknowable amount of printing/output that still bypasses this mechanism. Therefore, I really don't see this being reliable and maintainable.

If a rename is detected, why not simply modify the vname field of the old varinfo record? It is mutable after all. Since the identity of varinfos is not based on vname at all, but rather just vid, then there should be absolutely no harm, or is there?
I would really hope not, because that approach would avoid all the RenameMapping hassle all over the place.

src/analyses/apron/apronAnalysis.apron.ml Outdated Show resolved Hide resolved
src/analyses/base.ml Outdated Show resolved Hide resolved
src/analyses/base.ml Outdated Show resolved Hide resolved
src/analyses/spec.ml Outdated Show resolved Hide resolved
src/framework/analyses.ml Outdated Show resolved Hide resolved
src/incremental/compareCIL.ml Outdated Show resolved Hide resolved
src/incremental/compareCIL.ml Outdated Show resolved Hide resolved
src/incremental/compareCIL.ml Outdated Show resolved Hide resolved
src/incremental/compareCIL.ml Outdated Show resolved Hide resolved
scripts/test-incremental-multiple.sh Show resolved Hide resolved
@sim642
Copy link
Member

sim642 commented May 9, 2022

Also have a look at the CI failures. Semgrep complains in two places and the incremental test group should be renumbered to something which already doesn't exist (I suppose we might have created such group in the meanwhile ourselves).

@michael-schwarz
Copy link
Member

Thank you for this PR, I think further enhancing our incrementality in this direction is a very useful thing!

type method_rename_assumption = {original_method_name: string; new_method_name: string; parameter_renames: (string, string) Hashtbl.t}
type method_rename_assumptions = (string, method_rename_assumption) Hashtbl.t

(*rename_mapping is carried through the stack when comparing the AST. Holds a list of rename assumptions.*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be carried through the stack? Maybe we can just make it mutable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any global state will make it impossible to parallelize parsing/merging/comparing, so not introducing new global state into these parts would be good, if we hope some day to be able to speed up our preprocessing.

There is another (typ * typ) list structure being passed around in many of the comparison functions though, so with this change there would be two. Since most recursion here just passes them around, they could be packaged together into a single record type, which is also easier to extend in the future with other structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The carrying through the stack is required for the detection of renamed functions and global variables which I am also working on. As such I am currently working on making the whole AST comparison fully functional without side effects.

@TimOrtel TimOrtel requested a review from sim642 June 17, 2022 12:47
@TimOrtel
Copy link
Contributor Author

I have removed RenameMapping and implemented other improvements. Maybe you could give me even more feedback regarding my changes.

@sim642
Copy link
Member

sim642 commented Jun 20, 2022

I have removed RenameMapping

Looking at the changes here on GitHub, it still shows RenameMapping being added and RenameMapping.show_varinfo being used in other places, so I'm confused.

@TimOrtel
Copy link
Contributor Author

TimOrtel commented Jun 20, 2022

@sim642 I just see it now that you mention it. It must have been an error when I was merging my branches.

@TimOrtel
Copy link
Contributor Author

Now it should actually be removed.

Copy link
Member

@sim642 sim642 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't test whether this works (I'll let @stilscher confirm that), but if it does, then it's certainly nice enough now without RenameMapping!

A note to ourselves: given that the commits in this PR have been back and forth changing many places for RenameMapping and then undoing it all, once it comes to merging I think it'd be best to squash merge this to avoid those back-and-forths from cluttering the git history.

scripts/test-incremental-multiple.sh Show resolved Hide resolved
src/cdomains/baseDomain.ml Outdated Show resolved Hide resolved
tests/incremental/04-var-rename/00-unused_rename.c Outdated Show resolved Hide resolved
tests/incremental/04-var-rename/diffs/00-unused_rename.c Outdated Show resolved Hide resolved
@sim642 sim642 requested a review from stilscher June 22, 2022 07:51
Copy link
Member

@stilscher stilscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the functionality of the support for detecting renamed local variables on a couple of regression tests (with renamings in the incremental run). The number of changed functions as well as the output and analysis result look good in many cases (f.e. renaming in recursive functions, renaming multiple parameters, pointers, renaming successively, swapping names (is not detected but considered as changed)).
Things that still need to be fixed, are

  • not outputting outdated names of formal parameters
  • and the implementation should be changed such that it does not hinder when using the cfg comparison (I described this in more detail in the comment below).

When renaming static local variables the renaming detection does not help, which is to be expected since these are global variables in CIL. I also noticed, that there are problems when renaming functions that also have a declaration. This is due to the missing grouping of functions and corresponding declarations as described in #627. My suggestion is, to fix this not within this PR but as part of the linked issue.

src/incremental/compareAST.ml Outdated Show resolved Hide resolved
src/incremental/updateCil.ml Outdated Show resolved Hide resolved
src/incremental/compareCIL.ml Outdated Show resolved Hide resolved
let sizeEqual, local_rename = rename_mapping_aware_compare a.slocals b.slocals headerRenameMapping in
let rename_mapping: rename_mapping = (local_rename, global_rename_mapping) in

let sameDef = unchangedHeader && sizeEqual in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The renaming-aware comparison is used when comparing the parameters and local variables of a function. When using the cfg comparison, the body is however compared with an empty rename mapping. I think this is inconsistent. In most cases this does not directly cause a problem, because it is usually hidden by cil creating unique variable names within a function or a merge error in case of undeclared variables (when one only renames the declaration).
An example where this causes a problem, is whenever an unused formal parameter or local variable is renamed (without any other changes). The function headers of the two versions will be equivalent (due to the existence of a valid rename mapping) and no change will be detected in the functions body (because the renamed variable never appears). The function is considered unchanged, is not reanalyzed, and so the output still contains the old version of the renamed variable name instead of the updated one.
I see two options how to solve this:

  1. the basic approach: support rename detection only for the ast comparison, but make sure it does not break the cfg comparison. This would require to use an empty rename map also during the header comparison of functions. When the cfg comparison is turned on, the construction of the rename map could even be skipped completely.
  2. the nicer approach: support rename detection of variables within functions for the cfg comparison also. As far as I can tell, this would require to hand-through the constructed rename mapping to phase 1 of the cfg comparison and eq_node and eq_edge subsequently. In updateCil.ml the old names of the formal parameters and local variables would need to be overwritten with the new names for the partially changed functions (in reset_changed_function) to obtain a correct output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second approach is a nice idea. If I still find the time to do it, I will definitely implement it. However, because implementing the second approach comes with a lot of extra work in testing and verifying I implemented the first version for now.

Copy link
Member

@stilscher stilscher Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created issue #777 for implementing the second approach later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, your implementation for the first approach does still not work correctly. The comparison of the local variables is still done with the rename-aware comparison even if the cfg comparison is activated. This leads to renamed and unused local variables in a partially changed function not being shown with the updated name in the output. As an example you can take a look at

#include<assert.h>

int main () {
  int a = 3;
  int b; // rename to d
  int c;
  c = a + 2; // change to a + 3
  assert(a == 3);
  return 0;
}

In the incremental run, the old name b will still be used in the output.

Copy link
Member

@stilscher stilscher Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think, that implementing approach 2 would not be much more work. But I think it is ok, to postpone it and implement it in a new PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the locals are now checked again for cfg runs.

@TimOrtel
Copy link
Contributor Author

TimOrtel commented Jul 5, 2022

Thank you for taking the time to look at my PR @stilscher. I have implemented your feedback.

@sim642 sim642 merged commit 015966e into goblint:master Jul 11, 2022
TimOrtel added a commit to TimOrtel/analyzer that referenced this pull request Jul 12, 2022
@sim642 sim642 added this to the v2.0.0 milestone Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature performance Analysis time, memory usage student-job
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants