-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loop normal form properties #7518
Comments
The way the pre-header node was explained in the recent meeting appeared to require the use of a labelled skip instruction. It is worth noting that there is an existing normalisation pass in the cbmc code base which is called from tens of places called |
Are header nodes of nested loops allowed to coincide? |
Agreed, we can always add these nodes as needed afterwards. I think the same observation could apply to exit nodes. We would like exit nodes to have only incoming edges coming from loop instructions so that the header of the loop dominates the exit nodes and that any instrumentation we may add on exit nodes is only executed when the loop has been entered once. This may require the insertion of a skip node to separate incoming edges from the loop from incoming edges coming from somewhere else. I am not sure however that reintroduce these skip nodes as needed after the fact is as easy as in the pre-header node case. |
I'd like to rephrase the definition a bit to make it clearer that a natural loop is specific to a back edge: A back-edge is an edge from a node So by that definition it seems like two distinct natural loops (i.e that have different back edges and are each a minimum size SCC) can have the same header node. It seems like the pre-header node requirement would reject sharing though. It would be nice to have a checker function that can optionally check for the presence of pre-header and dominated exit nodes (since having these properties will really make loop contracts instrumentation easier). |
Is it worth considering some sort of requirement for basic blocks to be maximal as part of this definition? For example if we have 2 single entry single exit blocks of instructions A and B where block A ends with a GOTO which jumps to block B then the two blocks should be combined and the GOTO instruction removed, in order to reach normal form. I am thinking about resolving issues such as the one seen here - #7506 I'd also like to ask how closely does the existing code for detecting natural loops in |
Thanks @remi-delmas-3000, this is a good foundation. I don't think any of this is wrong but here are a load of pedantic questions with the aim of tightening things up:
Good point. I was implicitly assuming loop normal form properties as an addition to the core GOTO fragment documented here : #7505
Could you provide an example ? The only case I can think of is that of
Not necessarily, as I am not really certain what
We want this node to be unique.
Yes this depends on choosing the header. For a CFG that was successfully put in loop-normal-form, we would like to have natural loop back-edges be the only instruction-sequence back-edges.
Doesn't having (by definition) the header node dominate all nodes of the loop rule out multiple entrance nodes ?
if (thing) { I that case we would need to distinguish the exit node of the loop from the jump target of
I note that there are a couple of differences with this definition and LLVM's one. Your header is not required to be unique, or an entry node. Also I note that there is no maximality condition so: A -> B
@martin-cs have you read this addendum (there's a minimality condition which I added to account for nested loops (if two natural loops are nested the minimality of each SCC should make nested loops emerge naturally).
|
Yes. For now what we are asking for loop contracts is a function that will tell us if a program is in loop normal form or not, and a normalisation function that would succeed when goto programs from structured C programs programs that use if-then-else, while-loops, for-loops and do-while-loops with break and continue statements.
Yes. That's why this normalisation pass would be opt-in and invoked when needed for loop contracts, and optionally on programs coming from kani.
Yes that would be really good. So maybe the loop normal form checking function, instead of failing on awkward loops, could return a collection of loop descriptors and the "normal form/awkward" flag is just part of that loop descriptor. I dont' think tagging the actual loop instructions is a good idea, since normal form properties can be invalidated by program instrumentation/transformation passes. Example input C programunsigned int factorial(unsigned int number)
{
unsigned int factorial = 1;
for(unsigned int i=0; ++i<=number;)
{
factorial = factorial*i;
}
return factorial;
} Example labelled loop GOTO programfactorial /* factorial */
// 0 file ./factorial.c line 4 function factorial
DECL factorial::1::factorial : unsignedbv[32]
// 1 file ./factorial.c line 4 function factorial
ASSIGN factorial::1::factorial := cast(1, unsignedbv[32])
// 2 file ./factorial.c line 5 function factorial
DECL factorial::1::1::i : unsignedbv[32]
// 3 file ./factorial.c line 5 function factorial
ASSIGN factorial::1::1::i := cast(0, unsignedbv[32])
// 4 file ./factorial.c line 5 function factorial
// Labels: __CPROVER_loop1_header
1: ASSIGN factorial::1::1::i := factorial::1::1::i + 1
// 5 file ./factorial.c line 5 function factorial
// Labels: __CPROVER_loop1_latch
IF ¬(factorial::1::1::i ≤ factorial::number) THEN GOTO 2
// 6 file ./factorial.c line 7 function factorial
ASSIGN factorial::1::factorial := factorial::1::factorial * factorial::1::1::i
// 7 file ./factorial.c line 5 function factorial
GOTO 1
// 8 file ./factorial.c line 8 function factorial
// Labels: __CPROVER_loop1_exit
2: DEAD factorial::1::1::i
// 9 file ./factorial.c line 9 function factorial
SET RETURN VALUE factorial::1::factorial
// 10 file ./factorial.c line 9 function factorial
DEAD factorial::1::factorial
// 11 file ./factorial.c line 10 function factorial
END_FUNCTION
The latch node in that example seems to be instruction 7 |
|
I have questions about the applications of the normal form we are
discussing in this ticket. We need to consider how both normalised
and awkward loop cases are handled. Awkward examples can always be
valid goto-programs because they can be written by hand in C. CBMC
should ideally be able to analyse any of these awkward cases. This
would imply that an input which fails the loop checking function is
still valid for analysis. I acknowledge there is an ideal to
normalise awkward inputs. However until we have a working
normalisation implementation for all possible inputs, I think that
goto-programs containing both normalised loops and awkward loops must
both be considered valid.
Agreed. We need algorithms that are robust, or, at least, aware of
awkward loops.
So should we be aiming to add labels to the goto program or some
other data structure such that the normalised loops can be marked and
subsequently processed by loop-specific passes and the other awkward
loops bypassed?
I think it will need to be an auxiliary data structure because labels
are not sufficiently "brittle". Some of the proposed properties can be
easily broken by things like remove_skip and other passes. If you
label then you will need to explicitly invalidate the labels. If it is
an auxiliary data structure then you will have to directly maintain the
link or recompute when the goto program changes.
The specification proposed allows for multiple exit nodes with a
single latch node. But doesn't multiple exit nodes imply multiple
latch nodes as well?
Is your thinking that these will all be of the form "exit or go back to
the top"? It is a fair point but the "one latch edge" is possible by
creating a single SKIP node and redirecting all latch nodes to that.
This is an example of the previous "normal form properties that are
fragile to program transformations".
|
> Is it worth considering some sort of requirement for basic blocks
> to be maximal as part of this definition?
yes block maximality would be good to have. For #7506<
#7506> the core problem is
having an edge that goes back in the instruction sequence but that is
not the back edge of an actual natural loop.
Note that remove_skip will merge basic blocks if they are sequential.
Also I was sure that somewhere we had the "jump forwarding" pass that
does something like:
```
if (it->is_goto() && it->get_target()->is_goto()) {
if (it->get_target().condition().is_true()) {
it->set_target(it->get_target()->get_target());
} else if (it->get_target().condition().is_false()) {
it->set_target(std::next(it->get_target()));
}
}
```
which might also help reduce basic block size. It seems
src/goto-programs/ensure_one_backedge_per_target.cpp
has half of this.
|
@thomasspriggs do you think the requirements sufficiently well defined now ? |
My team has re-prioritised some of the work. I now have other documentation to write first. So I am not actively working on this issue for the moment. |
This is not a blocker for our GOTO standardization anymore, so we can remove the high severity flag. We have also worked around corner cases in the GOTO instrumentation for loop contracts. |
@thomasspriggs @martin-cs Here is a summary of the properties of loop normal form for goto programs:
Viewing a GOTO program as both the sequence of its instructions as the control flow graph induced by the sequence structure and its GOTO statements: each instruction is a node, there's an edge n1->n2 iff n2 is the successor of n1 in the sequence, or if n1 is a GOTO instruction with n2 as jump target. The entry point of the graph is the node of the first instruction.
A loop in the CFG is a set of strongly connected nodes. The loop is natural iff there is a node in the loop, the header, that dominates the other nodes of the loop. An edge going from an instruction of the loop to the header node is called a back-edge. A node that has a back edge is called a latch node.
A node is an exiting node if it has at least one successor that is not in the loop. That successor outside of the loop is called an exit node of the loop.
The properties of a normalised natural loop are:
We say that a natural loop is densely packed in the goto program iff the sub-sequence of instructions starting at the loop header instruction and ending at the loop latch instruction only contains instructions of the loop and iff the preheader node is right before the header node in the sequence.
A goto-program is loop normal form iff all the loops it contains are natural, densely packed and if the only edges that jump to an instruction with a lower index in the sequence are back-edges of natural loops.
These notions are captured in two functions:
The text was updated successfully, but these errors were encountered: