88\section goto-programs-overview Overview
99Goto programs are the intermediate representation of the CPROVER tool
1010chain. They are language independent and similar to many of the compiler
11- intermediate languages. Section \ref goto-programs describes the
12- ` goto_programt ` and ` goto_functionst ` data structures in detail. However
13- it useful to understand some of the basic concepts. Each function is a
14- list of instructions, each of which has a type (one of 18 kinds of
11+ intermediate languages. Each function is a
12+ list of instructions, each of which has a type (one of 19 kinds of
1513instruction), a code expression, a guard expression and potentially some
1614targets for the next instruction. They are not natively in static
1715single-assign (SSA) form. Transitions are nondeterministic (although in
1816practise the guards on the transitions normally cover form a disjoint
1917cover of all possibilities). Local variables have non-deterministic
20- values if they are not initialised. Variables and data within the
21- program is commonly one of three types (parameterised by width):
22- ` unsignedbv_typet ` , ` signedbv_typet ` and ` floatbv_typet ` , see
23- ` util/std_types.h ` for more information. Goto programs can be serialised
18+ values if they are not initialised. Goto programs can be serialised
2419in a binary (wrapped in ELF headers) format or in XML (see the various
2520` _serialization ` files).
2621
27- The ` cbmc ` option ` – show-goto-programs` is often a good starting point
22+ The ` cbmc ` option ` -- show-goto-programs` is often a good starting point
2823as it outputs goto-programs in a human readable form. However there are
2924a few things to be aware of. Functions have an internal name (for
3025example ` c::f00 ` ) and a ‘pretty name’ (for example ` f00 ` ) and which is
3126used depends on whether it is internal or being presented to the user.
32- The ` main ` method is the ‘logical’ main which is not necessarily the
33- main method from the code. In the output ` NONDET ` is use to represent a
34- nondeterministic assignment to a variable. Likewise ` IF ` as a beautified
35- ` GOTO ` instruction where the guard expression is used as the condition.
36- ` RETURN ` instructions may be dropped if they precede an ` END_FUNCTION `
37- instruction. The comment lines are generated from the ` locationt ` field
27+ ` NONDET(some_type) ` is use to represent a nondeterministic value.
28+ ` IF guard GOTO x ` represents a GOTO instruction with a guard, not a distinct
29+ ` IF ` instruction.
30+ The comment lines are generated from the ` source_location ` field
3831of the ` instructiont ` structure.
3932
40- ` goto-programs/ ` is one of the few places in the CPROVER codebase that
41- templates are used. The intention is to allow the general architecture
42- of program and functions to be used for other formalisms. At the moment
43- most of the templates have a single instantiation; for example
44- ` goto_functionst ` and ` goto_function_templatet ` and ` goto_programt ` and
45- ` goto_program_templatet ` .
46-
4733\section data_structures Data Structures
4834
49- FIXME: This text is partially outdated.
50-
51- The common starting point for working with goto-programs is the
52- ` read_goto_binary ` function which populates an object of
53- ` goto_functionst ` type. This is defined in ` goto_functions.h ` and is an
54- instantiation of the template ` goto_functions_templatet ` which is
55- contained in ` goto_functions_template.h ` . They are wrappers around a map
56- from strings to ` goto_programt ` ’s and iteration macros are provided.
57- Note that ` goto_function_templatet ` (no ` s ` ) is defined in the same
58- header as ` goto_functions_templatet ` and is gives the C type for the
59- function and Boolean which indicates whether the body is available
60- (before linking this might not always be true). Also note the slightly
61- counter-intuitive naming; ` goto_functionst ` instances are the top level
35+ A \ref goto_functionst object contains a set of GOTO programs. Note the
36+ counter-intuitive naming: ` goto_functionst ` instances are the top level
6237structure representing the program and contain ` goto_programt ` instances
63- which represent the individual functions. At the time of writing
64- ` goto_functionst ` is the only instantiation of the template
65- ` goto_functions_templatet ` but other could be produced if a different
66- data-structures / kinds of models were needed for functions.
67-
68- ` goto_programt ` is also an instantiation of a template. In a similar
69- fashion it is ` goto_program_templatet ` and allows the types of the guard
70- and expression used in instructions to be parameterised. Again, this is
71- currently the only use of the template. As such there are only really
72- helper functions in ` goto_program.h ` and thus ` goto_program_template.h `
73- is probably the key file that describes the representation of (C)
74- functions in the goto-program format. It is reasonably stable and
75- reasonably documented and thus is a good place to start looking at the
76- code.
77-
78- An instance of ` goto_program_templatet ` is effectively a list of
79- instructions (and inner template called ` instructiont ` ). It is important
38+ which represent the individual functions.
39+
40+ An instance of \ref goto_programt is effectively a list of
41+ instructions (a nested class called \ref goto_programt::instructiont).
42+ It is important
8043to use the copy and insertion functions that are provided as iterators
8144are used to link instructions to their predecessors and targets and
82- careless manipulation of the list could break these. Likewise there are
83- helper macros for iterating over the instructions in an instance of
84- ` goto_program_templatet ` and the use of these is good style and strongly
85- encouraged.
86-
87- Individual instructions are instances of type ` instructiont ` . They
88- represent one step in the function. Each has a type, an instance of
89- ` goto_program_instruction_typet ` which denotes what kind of instruction
45+ careless manipulation of the list could break these.
46+
47+ Individual instructions are instances of type \ref goto_programt::instructiont.
48+ They represent one step in the function. Each has a type, an instance of
49+ \ref goto_program_instruction_typet which denotes what kind of instruction
9050it is. They can be computational (such as ` ASSIGN ` or ` FUNCTION_CALL ` ),
9151logical (such as ` ASSUME ` and ` ASSERT ` ) or informational (such as
92- ` LOCATION ` and ` DEAD ` ). At the time of writing there are 18 possible
52+ ` LOCATION ` and ` DEAD ` ). At the time of writing there are 19 possible
9353values for ` goto_program_instruction_typet ` / kinds of instruction.
9454Instructions also have a guard field (the condition under which it is
9555executed) and a code field (what the instruction does). These may be
96- empty depending on the kind of instruction. In the default
97- instantiations these are of type ` exprt ` and ` codet ` respectively and
98- thus covered by the previous discussion of ` irept ` and its descendents.
99- The next instructions (remembering that transitions are guarded by
56+ empty depending on the kind of instruction.
57+ These are of type \ref exprt and \ref codet respectively.
58+ The next instructions (remembering that transitions may be
10059non-deterministic) are given by the list ` targets ` (with the
10160corresponding list of labels ` labels ` ) and the corresponding set of
10261previous instructions is get by ` incoming_edges ` . Finally ` instructiont `
103- have informational ` function ` and ` location ` fields that indicate where
104- they are in the code.
62+ has informational ` function ` and ` source_location ` fields that indicate where
63+ they are in the source code.
10564
10665\section goto-conversion Goto Conversion
10766
@@ -124,8 +83,12 @@ digraph G {
12483}
12584\enddot
12685
127- At this stage, CBMC constructs a goto-program from a symbol table. It
128- does not use the parse tree or the source file at all for this step. This
86+ At this stage, CBMC constructs a goto-program from a symbol table.
87+ Each symbol in the symbol table with function type (that is, the symbol's ` type `
88+ field contains a \ref code_typet) will be converted to a corresponding GOTO
89+ program.
90+
91+ It does not use the parse tree or the source file at all for this step. This
12992may seem surprising, because the symbols are stored in a hash table and
13093therefore have no intrinsic order; nevertheless, every \ref symbolt is
13194associated with a \ref source_locationt, allowing CBMC to figure out the
@@ -282,18 +245,6 @@ This is not the final form of the goto-functions, since the lists of
282245instructions will be 'normalized' in the next step (Instrumentation),
283246which removes some instructions and adds targets to others.
284247
285- Note that goto_programt and goto_functionst are each template
286- instantiations; they are currently the * only* specialization of
287- goto_program_templatet and goto_functions_templatet, respectively. This
288- means that the generated Doxygen documentation can be somewhat obtuse
289- about the actual types of things, and is unable to generate links to the
290- correct classes. Note that the
291- \ref goto_programt::instructiont::code "code" member of a
292- goto_programt's instruction has type \ref codet (its type in the
293- goto_program_templatet documentation is given as "codeT", as this is the
294- name of the template's type parameter); similarly, the type of a guard
295- of an instruction is \ref guardt.
296-
297248---
298249\section instrumentation Instrumentation
299250
@@ -324,8 +275,15 @@ previous stage:
324275 are littered with \ref code_skipt "skip" statements. The
325276 instrumentation stage removes the majority of these.
326277
327- * Function pointers are removed. They are turned into switch statements
328- (but see the next point; switch statements are further transformed).
278+ * Variable lifespan implied by \ref code_declt instructions and lexical scopes
279+ described by \ref code_blockt nodes is replaced by ` DECL ` and corresponding
280+ ` DEAD ` instructions. There are therefore no lexical scopes in GOTO programs
281+ (not even local variable death on function exit is enforced).
282+
283+ * Expressions with side-effects are explicitly ordered so that there is one
284+ effect per instruction (apart from function call instructions, which can
285+ still have many). For example, ` y = f() + x++ ` will have become something like
286+ ` tmp1 = f(); y = tmp1 + x; x = x + 1; `
329287
330288* Compound blocks are eliminated. There are several subclasses of
331289 \ref codet that count as 'compound blocks;' therefore, later stages in
@@ -362,13 +320,7 @@ previous stage:
362320* \ref code_returnt "return statements" are transformed into
363321 (unconditional) GOTOs whose target is the \ref END_FUNCTION
364322 instruction. Each goto_programt should have precisely one such
365- instruction. Note the presence of \ref code_deadt, which has a
366- \ref code_deadt::symbol() "symbol()" member. Deads mark symbols that
367- have just gone out of scope; typically, a GOTO that jumps to an
368- END_FUNCTION instruction is preceded by a series of deads. Deads also
369- follow sequences of instructions that were part of the body of a
370- block (loop, conditional etc.) if there were symbols declared in that
371- block.
323+ instruction.
372324
373325This stage concludes the * analysis-independent* program transformations.
374326
0 commit comments