Skip to content

Analyze a Simple C Program

Yulei Sui edited this page Jul 2, 2024 · 45 revisions

An Example

We use an example to go through each component of SVF: (1) Memory Model including PAG and Constraint Graph, (2) Pointer Analysis including both flow-insensitive and flow-sensitive analyses, and (3) Value-Flow Construction.

1. C Code

void swap(char **p, char **q){
  char* t = *p; 
       *p = *q; 
       *q = t;
}
int main(){
      char a1, b1; 
      char *a = &a1;
      char *b = &b1;
      swap(&a,&b);
}

2. LLVM IR after the mem2Reg option is turned on (A project or a C file can also be compiled to generate bc using wllvm)

clang -S -c -Xclang -disable-O0-optnone -fno-discard-value-names -emit-llvm swap.c -o swap.ll
opt -S -p=mem2reg swap.ll -o swap.ll
; ModuleID = 'example.ll'
source_filename = "example.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline nounwind uwtable
define dso_local void @swap(ptr noundef %p, ptr noundef %q) #0 {
entry:
  %0 = load ptr, ptr %p, align 8
  %1 = load ptr, ptr %q, align 8
  store ptr %1, ptr %p, align 8
  store ptr %0, ptr %q, align 8
  ret void
}

; Function Attrs: noinline nounwind uwtable
define dso_local i32 @main() #0 {
entry:
  %a1 = alloca i8, align 1
  %a = alloca ptr, align 8
  %b1 = alloca i8, align 1
  %b = alloca ptr, align 8
  store ptr %a1, ptr %a, align 8
  store ptr %b1, ptr %b, align 8
  call void @swap(ptr noundef %a, ptr noundef %b)
  ret i32 0
}

attributes #0 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 16.0.0"}

3. Call Graph

wpa -ander -dump-callgraph swap.ll

Callgraph

4. Interprocedural Control-Flow Graph

wpa -type -dump-icfg swap.ll

ICFG

5. Program Assignment Graph (PAG)

wpa -nander -dump-pag swap.ll

PAG

The different kinds of PAG nodes are denoted by Graphviz shapes. Here is the PAG node type hierarchy and the corresponding shapes:

  • PAGNode, the abstract base class
    • ValVar (ValPN), pointer value base class, shape=box
      • GepValVar (GepValPN), getelementptr node, represents an offset from a base pointer, shape=hexagon
      • DummyValVar (DummyValPN), used for special nodes not associated with an LLVM value, shape=diamond
    • ObjVar (ObjPN), base class of memory objects, shape=component
      • GepObjVar (GepObjPN), represents memory at an offset (a field or array element), shape=doubleoctagon
      • FIObjVar (FIObjPN), field-insensitive object, shape=box3d
      • DummyObjVar (DummyObjPN), special memory, shape=tab
    • RetPN, for unified function return, shape=Mrecord (box with rounded corners)
    • VarArgPN, represents the variable parameters in a call, shape=octagon

An edge between two nodes represents a constraint:

6. ConstraintGraph

wpa -nander -dump-pag -dump-constraint-graph swap.ll

A Constraint Graph is used when Andersen's flow-insensitive analysis is performed. The following rules are used in resolving the constraints in the program during such an inclusion-based pointer analysis:

andersen

During the analysis, new copy edges (solid black arrows) will be often added incrementally to the constraint graph until a fixed point has been reached. The final constraint graph is:

consG

The nodes of the constraint graph are in one-to-one correspondence with the PAG nodes and use the same shape. The graph shown here is the brief version. [Coming soon: If -brief-constraint-graph=false is specified then the nodes will also show the instructions.]

7. Andersen's Points-to Results

wpa -nander -print-pts swap.ll

NodeID 4 		PointsTo: { 5 }


NodeID 7 		PointsTo: { 22 }


NodeID 8 		PointsTo: { 24 }


NodeID 9 		PointsTo: { 18 20 }


NodeID 10 		PointsTo: { 18 20 }


NodeID 14 		PointsTo: { 15 }


NodeID 17 		PointsTo: { 18 }


NodeID 19 		PointsTo: { 20 }


NodeID 21 		PointsTo: { 22 }


NodeID 23 		PointsTo: { 24 }

8. Andersen's Analysis Statistics

wpa -ander -stat swap.ll

****Andersen Pointer Analysis Statistics****
################ (program : )###############
TotalPointers       18
TotalObjects        8
TotalFieldObjects   6
MaxStructSize       0
TotalEdges          30
FunctionObjs        2
GlobalObjs          0
HeapObjs            0
StackObjs           4
FIObjNum            0
FSObjNum            6
VarStructObj        0
VarArrayObj         0
ConstStructObj      0
ConstArrayObj       0
NonPtrObj           4
AddrsNum            6
LoadsNum            2
StoresNum           4
CopysNum            1
GepsNum             0
CallsNum            2
ReturnsNum          0
IndCallSites        0
LocalVarInRecur     0
BitCastNumber       0
BBWith2Succ         0
BBWith3Succ         0
-------------------------------------------------------
CollapseTime        0
TotalTime           0.001
SCCDetectTime       0.001
SCCMergeTime        0
LoadStoreTime       0
CopyGepTime         0
UpdateCGTime        0
AvgPtsSetSize       0.533333
AvgTopLvlPtsSize    1.2
CGNodeNum           27
PointsToConstPtr    0
PointsToBlkPtr      0
TotalPointers       18
TotalObjects        14
TotalEdges          11
AddrsNum            4
LoadsNum            2
StoresNum           4
CopysNum            5
GepsNum             0
AddrProcessed       6
LoadProcessed       2
StoreProcessed      4
CopyProcessed       8
GepProcessed        0
Pointers            18
DYFieldPtrs         0
MemObjects          8
DYFieldObjs         6
MaxPtsSetSize       2
Iterations          2
IndCallSites        0
IndEdgeSolved       0
NumOfSCCDetect      2
TotalCycleNum       1
TotalPWCCycleNum    0
NodesInCycles       4
MaxNodesInSCC       4
NullPointer         0
#######################################################

9. Flow-Sensitive Points-to Result

wpa -fspta -print-pts swap.ll

NodeID 4 		PointsTo: { 5 }


NodeID 7 		PointsTo: { 22 }


NodeID 8 		PointsTo: { 24 }


NodeID 9 		PointsTo: { 18 }


NodeID 10 		PointsTo: { 20 }


NodeID 14 		PointsTo: { 15 }


NodeID 17 		PointsTo: { 18 }


NodeID 19 		PointsTo: { 20 }


NodeID 21 		PointsTo: { 22 }


NodeID 23 		PointsTo: { 24 }

10. Flow-Sensitive Analysis Statistics

wpa -fspta -stat swap.ll

****Flow-Sensitive Pointer Analysis Statistics****
################ (program : )###############
TotalPointers       18
TotalObjects        8
TotalFieldObjects   6
MaxStructSize       0
TotalEdges          30
FunctionObjs        2
GlobalObjs          0
HeapObjs            0
StackObjs           4
FIObjNum            0
FSObjNum            6
VarStructObj        0
VarArrayObj         0
ConstStructObj      0
ConstArrayObj       0
NonPtrObj           4
AddrsNum            6
LoadsNum            2
StoresNum           4
CopysNum            1
GepsNum             0
CallsNum            2
ReturnsNum          0
IndCallSites        0
LocalVarInRecur     0
BitCastNumber       0
BBWith2Succ         0
BBWith3Succ         0
-------------------------------------------------------
SolveTime           0
SCCTime             0
ProcessTime         0
PropagationTime     0
DirectPropaTime     0
IndirectPropaTime   0
UpdateTime          0
AddrTime            0
CopyGepTime         0
LoadTime            0
StoreTime           0
UpdateCGTime        0
AvgPtsSize          1
AvgTopLvlPtsSize    1
AvgAddrTakenVarPts  1
AvgINPtsSize        1
AvgOUTPtsSize       1
AverageSCCSize      0
TotalTime           0
PointsToConstPtr    0
PointsToBlkPtr      0
StrongUpdates       4
SNodesHaveIN        6
SNodesHaveOUT       4
FI_SNodesHaveIN     0
FI_SNodesHaveOUT    0
FO_SNodesHaveIN     0
FO_SNodesHaveOUT    0
AI_SNodesHaveIN     0
AI_SNodesHaveOUT    0
AO_SNodesHaveIN     2
AO_SNodesHaveOUT    0
LD_SNodesHaveIN     2
LD_SNodesHaveOUT    0
ST_SNodesHaveIN     2
ST_SNodesHaveOUT    4
PHI_SNodesHaveIN    0
PHI_SNodesHaveOUT   0
VarHaveIN           6
VarHaveOUT          4
VarHaveIN_FI        0
VarHaveOUT_FI       0
VarHaveIN_FO        0
VarHaveOUT_FO       0
VarHaveIN_AI        0
VarHaveOUT_AI       0
VarHaveIN_AO        2
VarHaveOUT_AO       0
VarHaveIN_LD        2
VarHaveOUT_LD       0
VarHaveIN_ST        2
VarHaveOUT_ST       4
VarHaveIN_PHI       0
VarHaveOUT_PHI      0
MaxPtsSize          1
MaxTopLvlPtsSize    1
MaxINPtsSize        1
MaxOUTPtsSize       1
NumOfAddrTakenVar   4
MaxAddrTakenVarPts  1
ProcessedAddr       6
ProcessedCopy       2
ProcessedGep        0
ProcessedLoad       4
ProcessedStore      8
ProcessedPhi        4
ProcessedAParam     0
ProcessedFRet       0
ProcessedMSSANode   4
NumOfNodesInSCC     0
MaxSCCSize          1
NumOfSCC            0
TotalPointers       18
TotalObjects        14
StoresNum           4
CopysNum            1
Pointers            18
DYFieldPtrs         0
MemObjects          8
DYFieldObjs         6
Iterations          1
IndEdgeSolved       0
NullPointer         0
#######################################################

11. Memory SSA

The memory SSA representation of a program is built by adding MU and CHI functions to the LLVM IR of the program.

wpa -ander -svfg -dump-mssa swap.ll

==========FUNCTION: swap==========
2V_1 = ENCHI(MR_2V_0) 	pts{22 }
4V_1 = ENCHI(MR_4V_0) 	pts{24 }
entry

LDMU(MR_2V_1) 	pts{22 }
  %0 = load i8** %p, align 8

LDMU(MR_4V_1) 	pts{24 }
  %1 = load i8** %q, align 8
  store i8* %1, i8** %p, align 8
2V_2 = STCHI(MR_2V_1) 	pts{22 }

  store i8* %0, i8** %q, align 8
4V_2 = STCHI(MR_4V_1) 	pts{24 }

  ret void
RETMU(MR_2V_2) 	pts{22 }
RETMU(MR_4V_2) 	pts{24 }

==========FUNCTION: main==========
2V_1 = ENCHI(MR_2V_0) 	pts{22 }
4V_1 = ENCHI(MR_4V_0) 	pts{24 }
entry
  %a1 = alloca i8, align 1
  %b1 = alloca i8, align 1
  %a = alloca i8*, align 8
  %b = alloca i8*, align 8
  store i8* %a1, i8** %a, align 8
2V_2 = STCHI(MR_2V_1) 	pts{22 }

  store i8* %b1, i8** %b, align 8
4V_2 = STCHI(MR_4V_1) 	pts{24 }

CALMU(MR_2V_2) 	pts{22 }
CALMU(MR_4V_2) 	pts{24 }
  call void @swap(i8** %a, i8** %b) CallSite:   call void @swap(i8** %a, i8** %b)
2V_3 = CALCHI(MR_2V_2) 	pts{22 }
4V_3 = CALCHI(MR_4V_2) 	pts{24 }

  ret i32 0
RETMU(MR_2V_3) 	pts{22 }
RETMU(MR_4V_3) 	pts{24 }

12. Value-Flow Graph

An inter-procedural sparse value-flow graph (SVFG) for a program is a directed graph that captures the def-use chains of both top-level pointers and address-taken objects.

A SVFG Node can be

(1) a statement (SVFStmt),

(2) a parameter or

(3) a memory region (a set of abstract objects)

A green rectangle stands for an address PAG edge (AddrStmt), a red rectangle stands for a load PAG edge (LoadStmt), and a blue rectangle stands for a store PAG edge (StoreStmt).

A yellow rectangle represents a parameter (e.g., SVFG NodeID 14, which is an actual parameter corresponding to "a" PAG NodeID 21) or a memory region at the entry/exit of a function, a callsite or a load/store. For example, SVFG NodeID 27 represents the memory object PAG NodeID 24 that is indirectly read inside the callee function "swap" via the callsite ID 1.

Call and return value-flow edges are marked in red and blue colours. Direct and indirect value-flow edges are represented in solid and dashed arrows.

wpa -ander -svfg -dump-vfg swap.ll

svfg

A SVFG can be made more compact by eliminating unnecessary nodes such as ActualParm, AcutalIn and FormalRet, FormalOut using SVFG Optimizer. Note that the "opt-svfg" option means unoptimized if true, optimized if false (contrary to how one might think).

wpa -ander -svfg -dump-vfg -opt-svfg=false swap.ll

svfg_opt