|
10 | 10 | This section discusses some of the key data-structures used in the |
11 | 11 | CPROVER codebase. |
12 | 12 |
|
13 | | -\subsection irept Irept Data Structure |
| 13 | +\subsection irept_section Irept Data Structure |
14 | 14 |
|
15 | | -There are a large number of kinds of tree structured or tree-like data in |
16 | | -CPROVER. `irept` provides a single, unified representation for all of |
17 | | -these, allowing structure sharing and reference counting of data. As |
18 | | -such `irept` is the basic unit of data in CPROVER. Each `irept` |
19 | | -contains[^1] a basic unit of data (of type `dt`) which contains four |
20 | | -things: |
| 15 | +See \ref irept for more information. |
21 | 16 |
|
22 | | -* `data`: A string[^2], which is returned when the `id()` function is |
23 | | - used. |
| 17 | +\subsection irep_idt_section Irep_idt and Dstringt |
24 | 18 |
|
25 | | -* `named_sub`: A map from `irep_namet` (a string) to `irept`. This |
26 | | - is used for named children, i.e. subexpressions, parameters, etc. |
27 | | - |
28 | | -* `comments`: Another map from `irep_namet` to `irept` which is used |
29 | | - for annotations and other ‘non-semantic’ information |
30 | | - |
31 | | -* `sub`: A vector of `irept` which is used to store ordered but |
32 | | - unnamed children. |
33 | | - |
34 | | -The `irept::pretty` function outputs the contents of an `irept` directly |
35 | | -and can be used to understand and debug problems with `irept`s. |
36 | | - |
37 | | -On their own `irept`s do not “mean” anything; they are effectively |
38 | | -generic tree nodes. Their interpretation depends on the contents of |
39 | | -result of the `id` function (the `data`) field. `util/irep_ids.txt` |
40 | | -contains the complete list of `id` values. During the build process it |
41 | | -is used to generate `util/irep_ids.h` which gives constants for each id |
42 | | -(named `ID_`). These can then be used to identify what kind of data |
43 | | -`irept` stores and thus what can be done with it. |
44 | | - |
45 | | -To simplify this process, there are a variety of classes that inherit |
46 | | -from `irept`, roughly corresponding to the ids listed (i.e. `ID_or` |
47 | | -(the string `"or”`) corresponds to the class `or_exprt`). These give |
48 | | -semantically relevant accessor functions for the data; effectively |
49 | | -different APIs for the same underlying data structure. None of these |
50 | | -classes add fields (only methods) and so static casting can be used. The |
51 | | -inheritance graph of the subclasses of `irept` is a useful starting |
52 | | -point for working out how to manipulate data. |
53 | | - |
54 | | -There are three main groups of classes (or APIs); those derived from |
55 | | -`typet`, `codet` and `exprt` respectively. Although all of these inherit |
56 | | -from `irept`, these are the most abstract level that code should handle |
57 | | -data. If code is manipulating plain `irept`s then something is wrong |
58 | | -with the architecture of the code. |
59 | | - |
60 | | -Many of the key descendent of `exprt` are declared in `std_expr.h`. All |
61 | | -expressions have a named subfield / annotation which gives the type of |
62 | | -the expression (slightly simplified from C/C++ as `unsignedbv_typet`, |
63 | | -`signedbv_typet`, `floatbv_typet`, etc.). All type conversions are |
64 | | -explicit with an expression with `id() == ID_typecast` and an ‘interface |
65 | | -class’ named `typecast_exprt`. One key descendent of `exprt` is |
66 | | -`symbol_exprt` which creates `irept` instances with the id of “symbol”. |
67 | | -These are used to represent variables; the name of which can be found |
68 | | -using the `get_identifier` accessor function. |
69 | | - |
70 | | -`codet` inherits from `exprt` and is defined in `std_code.h`. It |
71 | | -represents executable code; statements in C rather than expressions. In |
72 | | -the front-end there are versions of these that hold whole code blocks, |
73 | | -but in goto-programs these have been flattened so that each `irept` |
74 | | -represents one sequence point (almost one line of code / one |
75 | | -semi-colon). The most common descendents of `codet` are `code_assignt` |
76 | | -so a common pattern is to cast the `codet` to an assignment and then |
77 | | -recurse on the expression on either side. |
78 | | - |
79 | | -[^1]: Or references, if reference counted data sharing is enabled. It is |
80 | | - enabled by default; see the `SHARING` macro. |
81 | | - |
82 | | -[^2]: Unless `USE_STD_STRING` is set, this is actually |
83 | | -a `dstring` and thus an integer which is a reference into a string table |
84 | | - |
85 | | -\subsection irep_idt Irep_idt and Dstringt |
86 | | - |
87 | | -Inside `irept`s, strings are stored as `irep_idt`s, or `irep_namet`s for |
88 | | -keys to `named_sub` or `comments`. If `USE_STD_STRING` is set then both |
89 | | -`irep_idt` and `irep_namet` are `typedef`ed to `std::string`, but by default |
90 | | -it is not set and they are `typedef`ed to `dstringt`. `dstringt` has one |
91 | | -field, an unsigned integer which is an index into a static table of strings. |
92 | | -This makes it expensive to create a new string (because you have to look |
93 | | -through the whole table to see if it is already there, and add it if it |
94 | | -isn't) but very cheap to compare strings (just compare the two integers). It |
95 | | -also means that when you have lots of copies of the same string you only have |
96 | | -to store the whole string once, which saves space. |
| 19 | +Inside \ref irept, strings are stored as irep_idts, or irep_namets for |
| 20 | +keys to named_sub or comments. By default both irep_idt and irep_namet |
| 21 | +are typedefed to \ref dstringt, unless USE_STD_STRING is set, in which |
| 22 | +case they are typedefed to std::string (this can be used for debugging |
| 23 | +purposes). |
97 | 24 |
|
98 | 25 | \dot |
99 | 26 | digraph G { |
|
0 commit comments