-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle renaming of local variables in incremental analysis (AST) #731
Conversation
…nt statements with those in many places
# Conflicts: # src/incremental/compareCIL.ml # src/util/server.ml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm extremely on the fence about this RenameMapping
and how it inevitably has to spread everywhere. As you might see from my browsing of the changes, there's an unknowable amount of printing/output that still bypasses this mechanism. Therefore, I really don't see this being reliable and maintainable.
If a rename is detected, why not simply modify the vname
field of the old varinfo
record? It is mutable
after all. Since the identity of varinfo
s is not based on vname
at all, but rather just vid
, then there should be absolutely no harm, or is there?
I would really hope not, because that approach would avoid all the RenameMapping
hassle all over the place.
Also have a look at the CI failures. Semgrep complains in two places and the incremental test group should be renumbered to something which already doesn't exist (I suppose we might have created such group in the meanwhile ourselves). |
Thank you for this PR, I think further enhancing our incrementality in this direction is a very useful thing! |
type method_rename_assumption = {original_method_name: string; new_method_name: string; parameter_renames: (string, string) Hashtbl.t} | ||
type method_rename_assumptions = (string, method_rename_assumption) Hashtbl.t | ||
|
||
(*rename_mapping is carried through the stack when comparing the AST. Holds a list of rename assumptions.*) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be carried through the stack? Maybe we can just make it mutable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any global state will make it impossible to parallelize parsing/merging/comparing, so not introducing new global state into these parts would be good, if we hope some day to be able to speed up our preprocessing.
There is another (typ * typ) list
structure being passed around in many of the comparison functions though, so with this change there would be two. Since most recursion here just passes them around, they could be packaged together into a single record type, which is also easier to extend in the future with other structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The carrying through the stack is required for the detection of renamed functions and global variables which I am also working on. As such I am currently working on making the whole AST comparison fully functional without side effects.
I have removed RenameMapping and implemented other improvements. Maybe you could give me even more feedback regarding my changes. |
Looking at the changes here on GitHub, it still shows |
@sim642 I just see it now that you mention it. It must have been an error when I was merging my branches. |
Now it should actually be removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test whether this works (I'll let @stilscher confirm that), but if it does, then it's certainly nice enough now without RenameMapping
!
A note to ourselves: given that the commits in this PR have been back and forth changing many places for RenameMapping
and then undoing it all, once it comes to merging I think it'd be best to squash merge this to avoid those back-and-forths from cluttering the git history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the functionality of the support for detecting renamed local variables on a couple of regression tests (with renamings in the incremental run). The number of changed functions as well as the output and analysis result look good in many cases (f.e. renaming in recursive functions, renaming multiple parameters, pointers, renaming successively, swapping names (is not detected but considered as changed)).
Things that still need to be fixed, are
- not outputting outdated names of formal parameters
- and the implementation should be changed such that it does not hinder when using the cfg comparison (I described this in more detail in the comment below).
When renaming static local variables the renaming detection does not help, which is to be expected since these are global variables in CIL. I also noticed, that there are problems when renaming functions that also have a declaration. This is due to the missing grouping of functions and corresponding declarations as described in #627. My suggestion is, to fix this not within this PR but as part of the linked issue.
src/incremental/compareCIL.ml
Outdated
let sizeEqual, local_rename = rename_mapping_aware_compare a.slocals b.slocals headerRenameMapping in | ||
let rename_mapping: rename_mapping = (local_rename, global_rename_mapping) in | ||
|
||
let sameDef = unchangedHeader && sizeEqual in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The renaming-aware comparison is used when comparing the parameters and local variables of a function. When using the cfg comparison, the body is however compared with an empty rename mapping. I think this is inconsistent. In most cases this does not directly cause a problem, because it is usually hidden by cil creating unique variable names within a function or a merge error in case of undeclared variables (when one only renames the declaration).
An example where this causes a problem, is whenever an unused formal parameter or local variable is renamed (without any other changes). The function headers of the two versions will be equivalent (due to the existence of a valid rename mapping) and no change will be detected in the functions body (because the renamed variable never appears). The function is considered unchanged, is not reanalyzed, and so the output still contains the old version of the renamed variable name instead of the updated one.
I see two options how to solve this:
- the basic approach: support rename detection only for the ast comparison, but make sure it does not break the cfg comparison. This would require to use an empty rename map also during the header comparison of functions. When the cfg comparison is turned on, the construction of the rename map could even be skipped completely.
- the nicer approach: support rename detection of variables within functions for the cfg comparison also. As far as I can tell, this would require to hand-through the constructed rename mapping to phase 1 of the cfg comparison and
eq_node
andeq_edge
subsequently. InupdateCil.ml
the old names of the formal parameters and local variables would need to be overwritten with the new names for the partially changed functions (inreset_changed_function
) to obtain a correct output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second approach is a nice idea. If I still find the time to do it, I will definitely implement it. However, because implementing the second approach comes with a lot of extra work in testing and verifying I implemented the first version for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created issue #777 for implementing the second approach later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, your implementation for the first approach does still not work correctly. The comparison of the local variables is still done with the rename-aware comparison even if the cfg comparison is activated. This leads to renamed and unused local variables in a partially changed function not being shown with the updated name in the output. As an example you can take a look at
#include<assert.h>
int main () {
int a = 3;
int b; // rename to d
int c;
c = a + 2; // change to a + 3
assert(a == 3);
return 0;
}
In the incremental run, the old name b
will still be used in the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think, that implementing approach 2 would not be much more work. But I think it is ok, to postpone it and implement it in a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the locals are now checked again for cfg runs.
Thank you for taking the time to look at my PR @stilscher. I have implemented your feedback. |
…s rename_mapping.
This pull requests implements two things for incremental analysis using AST:
How it works:
Detection of renamed local variables: In compareAST.ml an additional parameter rename_mapping has been added to all functions comparing (parts of) AST nodes. This rename mapping holds assumptions about renamed local variables and is carried through all calls that compare the ASTs. If the assumptions of rename_mapping are never broken and apart from names being modified nothing has changed, the function is guaranteed to not being changed.
The assumptions of rename_mapping are not broken, if all occurrences of variables in the old AST match the occurrences in the new AST, and all variables that have been renamed have the new name in all occurrences in the new AST.
The assumptions of rename_mapping are constructed in compareCil.ml/eqF by looking at the locals before any comparing is performed.
Why is rename_mapping a function attribute and not a global var?
It may be beneficial when rename detection of globals (functions, global vars) is added.
Unified output of varinfos:
The name of the old varinfo is now replaced by the name of the new varinfo when a local variable was renamed during an incremental run.
Additional note: Currently, in renameMapping.ml dn_obj is copied from Cil.dn_obj as Cil does not export dn_obj. This has to be changed in a pull request for Cil.
CFG based comparisons are currently not supported, meaning that the behavior for CFG has not changed.
Please comment on this pull request if you need more information and if you have feedback for me. I will try to work it in as soon as possible.