Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loop normal form properties #7518

Open
remi-delmas-3000 opened this issue Feb 3, 2023 · 14 comments
Open

loop normal form properties #7518

remi-delmas-3000 opened this issue Feb 3, 2023 · 14 comments
Assignees
Labels
aws Bugs or features of importance to AWS CBMC users feature request

Comments

@remi-delmas-3000
Copy link
Collaborator

@thomasspriggs @martin-cs Here is a summary of the properties of loop normal form for goto programs:

Viewing a GOTO program as both the sequence of its instructions as the control flow graph induced by the sequence structure and its GOTO statements: each instruction is a node, there's an edge n1->n2 iff n2 is the successor of n1 in the sequence, or if n1 is a GOTO instruction with n2 as jump target. The entry point of the graph is the node of the first instruction.

A loop in the CFG is a set of strongly connected nodes. The loop is natural iff there is a node in the loop, the header, that dominates the other nodes of the loop. An edge going from an instruction of the loop to the header node is called a back-edge. A node that has a back edge is called a latch node.

A node is an exiting node if it has at least one successor that is not in the loop. That successor outside of the loop is called an exit node of the loop.

The properties of a normalised natural loop are:

  • The header node is the only node with predecessors that are not in the loop
  • The header node has a unique predecessor called the pre-header node
  • The predecessors of exit nodes are all in the loop (i.e. control can only reach an exit node without actually going through the loop instructions first)
  • The loop has a single latch node

We say that a natural loop is densely packed in the goto program iff the sub-sequence of instructions starting at the loop header instruction and ending at the loop latch instruction only contains instructions of the loop and iff the preheader node is right before the header node in the sequence.

A goto-program is loop normal form iff all the loops it contains are natural, densely packed and if the only edges that jump to an instruction with a lower index in the sequence are back-edges of natural loops.

These notions are captured in two functions:

  • a checker function that checks if the loop normal form properties holds on a goto program, and returns either success or an error value describing why it does not hold, e.g. "exit node 'n' of loop 'l' has a predecessor that is not in 'l'", "there is a backwards jump in the sequence that is not a natural loop back-edge")
  • a normalisation function that rewrites a goto program's instruction sequence to loop normal form, and returns either success if the operation succeeds, or an error value describing the reason for failure.
  • the normalisation function should be idempotent, i.e. should not modify a program that is already in loop normal form (it can use the checker function to decide wether there is work to do or not).
@remi-delmas-3000 remi-delmas-3000 added aws Bugs or features of importance to AWS CBMC users aws-high feature request labels Feb 3, 2023
@remi-delmas-3000 remi-delmas-3000 self-assigned this Feb 3, 2023
@thomasspriggs
Copy link
Collaborator

The way the pre-header node was explained in the recent meeting appeared to require the use of a labelled skip instruction. It is worth noting that there is an existing normalisation pass in the cbmc code base which is called from tens of places called remove_skip. As the name implies it will remove skip instructions from goto programs. It can move labels around in order to remove skip instructions. Therefore it could potentially combine the pre-header node from this initial suggested form with the header node. Therefore think that we should not make the notion of a mandatory pre-header node part of our definition.

@peterschrammel
Copy link
Member

Are header nodes of nested loops allowed to coincide?

@feliperodri
Copy link
Collaborator

#7506
#7517

@remi-delmas-3000
Copy link
Collaborator Author

remi-delmas-3000 commented Feb 6, 2023

Therefore think that we should not make the notion of a mandatory pre-header node part of our definition.

Agreed, we can always add these nodes as needed afterwards.

I think the same observation could apply to exit nodes. We would like exit nodes to have only incoming edges coming from loop instructions so that the header of the loop dominates the exit nodes and that any instrumentation we may add on exit nodes is only executed when the loop has been entered once.

This may require the insertion of a skip node to separate incoming edges from the loop from incoming edges coming from somewhere else.

I am not sure however that reintroduce these skip nodes as needed after the fact is as easy as in the pre-header node case.

@remi-delmas-3000
Copy link
Collaborator Author

Are header nodes of nested loops allowed to coincide?

I'd like to rephrase the definition a bit to make it clearer that a natural loop is specific to a back edge:

A back-edge is an edge from a node l to another node h of the loop such that h dominates l from the entry node of the CFG. The dominance relation h dom l implies the existence of a path from h to l, the back edge that of a connected component in the CFG that contains l and h. The natural loop of the back-edge l->h is the smallest subset of nodes the CFG that contains both l and h and that is strongly connected.

So by that definition it seems like two distinct natural loops (i.e that have different back edges and are each a minimum size SCC) can have the same header node.

It seems like the pre-header node requirement would reject sharing though.

It would be nice to have a checker function that can optionally check for the presence of pre-header and dominated exit nodes (since having these properties will really make loop contracts instrumentation easier).

@thomasspriggs
Copy link
Collaborator

Is it worth considering some sort of requirement for basic blocks to be maximal as part of this definition? For example if we have 2 single entry single exit blocks of instructions A and B where block A ends with a GOTO which jumps to block B then the two blocks should be combined and the GOTO instruction removed, in order to reach normal form. I am thinking about resolving issues such as the one seen here - #7506
Cases like these, where code superficially looks like a loop due to the backwards GOTO, but no loop of any kind is actually present.

I'd also like to ask how closely does the existing code for detecting natural loops in src/analyses/natural_loops.h fit with the natural loop definition, which we are aiming for in this issue?

@martin-cs
Copy link
Collaborator

martin-cs commented Feb 11, 2023

Thanks @remi-delmas-3000, this is a good foundation.

I don't think any of this is wrong but here are a load of pedantic questions with the aim of tightening things up:

there's an edge

Assuming that THROW instructions have been removed?

Good point. I was implicitly assuming loop normal form properties as an addition to the core GOTO fragment documented here : #7505

The entry point of the graph is the node of the first instruction.

First in terms of sequence of instructions? In pathological cases this is not guaranteed to be the first instruction executed.

Could you provide an example ? The only case I can think of is that of __CPROVER_preconditions that are pulled out of functions at the call site.

The loop is natural

For clarity, are you expecting this definition to be equivalent to the one in src/analyses/natural_loops.h ? I am not convinced that one is correct or really captures what the code does.

Not necessarily, as I am not really certain what src/analyses/natural_loops.h computes. A first step could be to give a spec to natural_loops.h.

the header, that dominates the other nodes of the loop.

This is not unique is it? You can, and often will, have multiple nodes with this property?

We want this node to be unique.

An edge going from an instruction of the loop to the header node
is called a back-edge.

So which things are back edges depends on the choice of header? And these may not co-incide with instruction-sequence or CFG back-edges?

Yes this depends on choosing the header. For a CFG that was successfully put in loop-normal-form, we would like to have natural loop back-edges be the only instruction-sequence back-edges.

A node is an exiting node ...

Why not have entrance nodes defined in the same way?

Doesn't having (by definition) the header node dominate all nodes of the loop rule out multiple entrance nodes ?
Having a single entry (the header) and a pre-header node is important for loop contract transformation.

... properties of a normalised natural loop are ...
This seems pretty reasonable. It is slightly more general than what we were using in ASVAT as it allows multiple exits.

The predecessors of an exit node are all in the loop...
I can see why you want this but it might be a little bit fiddly. because they could also be merge targets, for example:

if (thing) {
while (other_thing) {
stuff();
}
}

I that case we would need to distinguish the exit node of the loop from the jump target of if(thing) { .. }.

... single latch node ...
I can see why this is useful and the definition works but be careful with how nested loops work.

I note that there are a couple of differences with this definition and LLVM's one. Your header is not required to be unique, or an entry node. Also I note that there is no maximality condition so:

A -> B
A -> C
B -> D
C -> D
D -> A

A, B, D is a loop? I guess the problem with maximality is that it makes nested loops hard to handle.
I'd say no according to the "for all loop nodes, all predecessors are in the loop" since D has C as predecessor and C is not in A, B, D.

@martin-cs have you read this addendum (there's a minimality condition which I added to account for nested loops (if two natural loops are nested the minimality of each SCC should make nested loops emerge naturally).

A back-edge is an edge from a node l to another node h of the loop such that h dominates l from the entry node of the CFG. The dominance relation h dom l implies the existence of a path from h to l, the back edge that of a connected component in the CFG that contains l and h. The natural loop of the back-edge l->h is the smallest subset of nodes the CFG that contains both l and h and that is strongly connected.

@thomasspriggs
Copy link
Collaborator

thomasspriggs commented Feb 13, 2023

I have questions about the applications of the normal form we are discussing in this ticket.

  • the main application is as a precondition loop contract transformation, where the densely packed sequence of instructions of a loop is replaced by a nondeterministic loop free check base + havoc + check step instruction sequence. This is an opt-in transformation, and we plan to invoke loop normalisation + loop normal form check on a goto program only if the user wants to apply loop contracts to that function. Most of the problems we have with loop contracts today will be solved by having having the sequence densely packed, having a unique pre-header node, and having dominated dedicated exit nodes.
  • the second application is to avoid asking the user for unwinding directives when there is no natural loop involved. We get this through the "the only instruction sequence back edges are natural loop back-edges". In addition, if this one day becomes an actual invariant of a core GOTO programs, symex could also rely on it to handle SSA counters.

We need to consider how both normalised and awkward loop cases are handled. Awkward examples can always be valid goto-programs because they can be written by hand in C. CBMC should ideally be able to analyse any of these awkward cases.

Yes. For now what we are asking for loop contracts is a function that will tell us if a program is in loop normal form or not, and a normalisation function that would succeed when goto programs from structured C programs programs that use if-then-else, while-loops, for-loops and do-while-loops with break and continue statements.

This would imply that an input which fails the loop checking function is still valid for analysis. I acknowledge there is an ideal to normalise awkward inputs. However until we have a working normalisation implementation for all possible inputs, I think that goto-programs containing both normalised loops and awkward loops must both be considered valid.

Yes. That's why this normalisation pass would be opt-in and invoked when needed for loop contracts, and optionally on programs coming from kani.

So should we be aiming to add labels to the goto program or some other data structure such that the normalised loops can be marked and subsequently processed by loop-specific passes and the other awkward loops bypassed?

Yes that would be really good. So maybe the loop normal form checking function, instead of failing on awkward loops, could return a collection of loop descriptors and the "normal form/awkward" flag is just part of that loop descriptor. I dont' think tagging the actual loop instructions is a good idea, since normal form properties can be invalidated by program instrumentation/transformation passes.

Example input C program
unsigned int factorial(unsigned int number)
{
  unsigned int factorial = 1;
  for(unsigned int i=0; ++i<=number;)
  {
    factorial = factorial*i;    
  }
  return factorial;
}
Example labelled loop GOTO program
factorial /* factorial */
        // 0 file ./factorial.c line 4 function factorial
        DECL factorial::1::factorial : unsignedbv[32]
        // 1 file ./factorial.c line 4 function factorial
        ASSIGN factorial::1::factorial := cast(1, unsignedbv[32])
        // 2 file ./factorial.c line 5 function factorial
        DECL factorial::1::1::i : unsignedbv[32]
        // 3 file ./factorial.c line 5 function factorial
        ASSIGN factorial::1::1::i := cast(0, unsignedbv[32])
        // 4 file ./factorial.c line 5 function factorial
        // Labels: __CPROVER_loop1_header
     1: ASSIGN factorial::1::1::i := factorial::1::1::i + 1
        // 5 file ./factorial.c line 5 function factorial
        // Labels: __CPROVER_loop1_latch
        IF ¬(factorial::1::1::ifactorial::number) THEN GOTO 2
        // 6 file ./factorial.c line 7 function factorial
        ASSIGN factorial::1::factorial := factorial::1::factorial * factorial::1::1::i
        // 7 file ./factorial.c line 5 function factorial
        GOTO 1
        // 8 file ./factorial.c line 8 function factorial
        // Labels: __CPROVER_loop1_exit
     2: DEAD factorial::1::1::i
        // 9 file ./factorial.c line 9 function factorial
        SET RETURN VALUE factorial::1::factorial
        // 10 file ./factorial.c line 9 function factorial
        DEAD factorial::1::factorial
        // 11 file ./factorial.c line 10 function factorial
        END_FUNCTION

The specification proposed allows for multiple exit nodes with a single latch node. But doesn't multiple exit nodes imply multiple latch nodes as well?

The latch node in that example seems to be instruction 7 GOTO 1. Instruction 5 IF ¬(factorial::1::1::i ≤ factorial::number) THEN GOTO 2 is an exiting node with corresponding exit node instruction 8. Instruction 7 is not an exiting node since its only successor is the header node. Multiple exit nodes start appearing when shortcutting conjunction or disjunction is used in loop guards or when break or continue statements are used in the loop body.

@remi-delmas-3000
Copy link
Collaborator Author

@thomasspriggs

Is it worth considering some sort of requirement for basic blocks to be maximal as part of this definition?
yes block maximality would be good to have. For #7506 the core problem is having an edge that goes back in the instruction sequence but that is not the back edge of an actual natural loop.

@martin-cs
Copy link
Collaborator

martin-cs commented Feb 13, 2023 via email

@martin-cs
Copy link
Collaborator

martin-cs commented Feb 13, 2023 via email

@remi-delmas-3000
Copy link
Collaborator Author

@thomasspriggs do you think the requirements sufficiently well defined now ?

@thomasspriggs
Copy link
Collaborator

My team has re-prioritised some of the work. I now have other documentation to write first. So I am not actively working on this issue for the moment.

@feliperodri
Copy link
Collaborator

feliperodri commented Mar 9, 2023

This is not a blocker for our GOTO standardization anymore, so we can remove the high severity flag. We have also worked around corner cases in the GOTO instrumentation for loop contracts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws Bugs or features of importance to AWS CBMC users feature request
Projects
None yet
Development

No branches or pull requests

5 participants