33
44# Folder util
55
6- \author Martin Brain
6+ \author Martin Brain, Owen Jones
77
88\section data_structures Data Structures
99
@@ -12,17 +12,17 @@ CPROVER codebase.
1212
1313\subsection irept Irept Data Structure
1414
15- There are a large number of kind of tree structured or tree-like data in
15+ There are a large number of kinds of tree structured or tree-like data in
1616CPROVER. ` irept ` provides a single, unified representation for all of
1717these, allowing structure sharing and reference counting of data. As
1818such ` irept ` is the basic unit of data in CPROVER. Each ` irept `
19- contains[ ^ 2 ] a basic unit of data (of type ` dt ` ) which contains four
19+ contains[ ^ 1 ] a basic unit of data (of type ` dt ` ) which contains four
2020things:
2121
22- * ` data ` : A string[ ^ 3 ] , which is returned when the ` id() ` function is
22+ * ` data ` : A string[ ^ 2 ] , which is returned when the ` id() ` function is
2323 used.
2424
25- * ` named_sub ` : A map from ` irep_namet ` (a string) to an ` irept ` . This
25+ * ` named_sub ` : A map from ` irep_namet ` (a string) to ` irept ` . This
2626 is used for named children, i.e. subexpressions, parameters, etc.
2727
2828* ` comments ` : Another map from ` irep_namet ` to ` irept ` which is used
@@ -32,7 +32,7 @@ things:
3232 unnamed children.
3333
3434The ` irept::pretty ` function outputs the contents of an ` irept ` directly
35- and can be used to understand an debug problems with ` irept ` s.
35+ and can be used to understand and debug problems with ` irept ` s.
3636
3737On their own ` irept ` s do not “mean” anything; they are effectively
3838generic tree nodes. Their interpretation depends on the contents of
@@ -67,21 +67,34 @@ class’ named `typecast_exprt`. One key descendent of `exprt` is
6767These are used to represent variables; the name of which can be found
6868using the ` get_identifier ` accessor function.
6969
70- ` codet ` inherits from ` exprt ` and is defined in ` std_code.h ` . They
71- represent executable code; statements in C rather than expressions. In
70+ ` codet ` inherits from ` exprt ` and is defined in ` std_code.h ` . It
71+ represents executable code; statements in C rather than expressions. In
7272the front-end there are versions of these that hold whole code blocks,
7373but in goto-programs these have been flattened so that each ` irept `
7474represents one sequence point (almost one line of code / one
7575semi-colon). The most common descendents of ` codet ` are ` code_assignt `
7676so a common pattern is to cast the ` codet ` to an assignment and then
7777recurse on the expression on either side.
7878
79- [ ^ 2 ] : Or references, if reference counted data sharing is enabled. It is
79+ [ ^ 1 ] : Or references, if reference counted data sharing is enabled. It is
8080 enabled by default; see the ` SHARING ` macro.
8181
82- [ ^ 3 ] : Unless ` USE_STD_STRING ` is set, this is actually
82+ [ ^ 2 ] : Unless ` USE_STD_STRING ` is set, this is actually
8383a ` dstring ` and thus an integer which is a reference into a string table
8484
85+ \subsection irep_idt Irep_idt and Dstringt
86+
87+ Inside ` irept ` s, strings are stored as ` irep_idt ` s, or ` irep_namet ` s for
88+ keys to ` named_sub ` or ` comments ` . If ` USE_STD_STRING ` is set then both
89+ ` irep_idt ` and ` irep_namet ` are ` typedef ` ed to ` std::string ` , but by default
90+ it is not set and they are ` typedef ` ed to ` dstringt ` . ` dstringt ` has one
91+ field, an unsigned integer which is an index into a static table of strings.
92+ This makes it expensive to create a new string (because you have to look
93+ through the whole table to see if it is already there, and add it if it
94+ isn't) but very cheap to compare strings (just compare the two integers). It
95+ also means that when you have lots of copies of the same string you only have
96+ to store the whole string once, which saves space.
97+
8598\dot
8699digraph G {
87100 node [ shape=box] ;
0 commit comments