Skip to content

Analyze a Simple C Program

Yulei Sui edited this page Oct 22, 2018 · 45 revisions

An Example

We use an example to go through each component of SVF: (1) Memory Model including PAG and Constraint Graph, (2) Pointer Analysis including both flow-insensitive and flow-sensitive analyses, and (3) Value-Flow Construction.

1. C Code

void swap(char **p, char **q){
  char* t = *p; 
       *p = *q; 
       *q = t;
}
int main(){
      char a1, b1; 
      char *a = &a1;
      char *b = &b1;
      swap(&a,&b);
}

2. LLVM IR after the mem2Reg option is turned on

clang -c -Xclang -disable-O0-optnone -emit-llvm swap.c -o swap.bc
opt -mem2reg swap.bc -o swap.opt
define void @swap(i8** %p, i8** %q) #0 {
entry:
  %0 = load i8** %p, align 8
  %1 = load i8** %q, align 8
  store i8* %1, i8** %p, align 8
  store i8* %0, i8** %q, align 8
  ret void
}
define i32 @main() #0 {
entry:
  %a1 = alloca i8, align 1
  %b1 = alloca i8, align 1
  %a = alloca i8*, align 8
  %b = alloca i8*, align 8
  store i8* %a1, i8** %a, align 8
  store i8* %b1, i8** %b, align 8
  call void @swap(i8** %a, i8** %b) 
  ret i32 0
}

3. Call Graph

wpa -ander -dump-callgraph swap.opt

Callgraph

4. PAG

wpa -nander -dump-pag swap.opt

PAG

As shown in the above figure, A node in PAG denotes either a pointer or an abstract memory object. A node shown as a circle represents a pointer (ValPN) while a node shown as an octagon represents an abstract memory object (ObjPN).

An edge between two nodes represents a constraint: a green edge for a memory allocation (AddrPE), a blue edge for a store (StorePE), a red edge for a load (LoadPE), and a dotted black edge for parameter passing (CallPE).

5. ConstraintGraph

wpa -nander -dump-pag -dump-consG swap.opt

A Constraint Graph is used when Andersen's flow-insensitive analysis is performned. The following rules are used in resolving the constraints in the program during such an inclusion-based pointer analysis:

andersen

During the analysis, new copy edges (solid black arrows) will be often added incrementally to the constraint graph until a fixed point has been reached. The final constraint graph is:

consG

6. Andersen's Points-to Results

wpa -nander -print-pts swap.opt

NodeID 4 		PointsTo: { 5 }


NodeID 7 		PointsTo: { 22 }


NodeID 8 		PointsTo: { 24 }


NodeID 9 		PointsTo: { 18 20 }


NodeID 10 		PointsTo: { 18 20 }


NodeID 14 		PointsTo: { 15 }


NodeID 17 		PointsTo: { 18 }


NodeID 19 		PointsTo: { 20 }


NodeID 21 		PointsTo: { 22 }


NodeID 23 		PointsTo: { 24 }

7. Andersen's Analysis Statistics

wpa -ander -stat swap.opt

****Andersen Pointer Analysis Statistics****
################ (program : )###############
TotalPointers       18
TotalObjects        8
TotalFieldObjects   6
MaxStructSize       0
TotalEdges          30
FunctionObjs        2
GlobalObjs          0
HeapObjs            0
StackObjs           4
FIObjNum            0
FSObjNum            6
VarStructObj        0
VarArrayObj         0
ConstStructObj      0
ConstArrayObj       0
NonPtrObj           4
AddrsNum            6
LoadsNum            2
StoresNum           4
CopysNum            1
GepsNum             0
CallsNum            2
ReturnsNum          0
IndCallSites        0
LocalVarInRecur     0
BitCastNumber       0
BBWith2Succ         0
BBWith3Succ         0
-------------------------------------------------------
CollapseTime        0
TotalTime           0.001
SCCDetectTime       0.001
SCCMergeTime        0
LoadStoreTime       0
CopyGepTime         0
UpdateCGTime        0
AvgPtsSetSize       0.533333
AvgTopLvlPtsSize    1.2
CGNodeNum           27
PointsToConstPtr    0
PointsToBlkPtr      0
TotalPointers       18
TotalObjects        14
TotalEdges          11
AddrsNum            4
LoadsNum            2
StoresNum           4
CopysNum            5
GepsNum             0
AddrProcessed       6
LoadProcessed       2
StoreProcessed      4
CopyProcessed       8
GepProcessed        0
Pointers            18
DYFieldPtrs         0
MemObjects          8
DYFieldObjs         6
MaxPtsSetSize       2
Iterations          2
IndCallSites        0
IndEdgeSolved       0
NumOfSCCDetect      2
TotalCycleNum       1
TotalPWCCycleNum    0
NodesInCycles       4
MaxNodesInSCC       4
NullPointer         0
#######################################################

8. Flow-Sensitive Points-to Result

wpa -fspta -print-pts swap.opt

NodeID 4 		PointsTo: { 5 }


NodeID 7 		PointsTo: { 22 }


NodeID 8 		PointsTo: { 24 }


NodeID 9 		PointsTo: { 18 }


NodeID 10 		PointsTo: { 20 }


NodeID 14 		PointsTo: { 15 }


NodeID 17 		PointsTo: { 18 }


NodeID 19 		PointsTo: { 20 }


NodeID 21 		PointsTo: { 22 }


NodeID 23 		PointsTo: { 24 }

9. Flow-Sensitive Analysis Statistics

wpa -fspta -stat swap.opt

****Flow-Sensitive Pointer Analysis Statistics****
################ (program : )###############
TotalPointers       18
TotalObjects        8
TotalFieldObjects   6
MaxStructSize       0
TotalEdges          30
FunctionObjs        2
GlobalObjs          0
HeapObjs            0
StackObjs           4
FIObjNum            0
FSObjNum            6
VarStructObj        0
VarArrayObj         0
ConstStructObj      0
ConstArrayObj       0
NonPtrObj           4
AddrsNum            6
LoadsNum            2
StoresNum           4
CopysNum            1
GepsNum             0
CallsNum            2
ReturnsNum          0
IndCallSites        0
LocalVarInRecur     0
BitCastNumber       0
BBWith2Succ         0
BBWith3Succ         0
-------------------------------------------------------
SolveTime           0
SCCTime             0
ProcessTime         0
PropagationTime     0
DirectPropaTime     0
IndirectPropaTime   0
UpdateTime          0
AddrTime            0
CopyGepTime         0
LoadTime            0
StoreTime           0
UpdateCGTime        0
AvgPtsSize          1
AvgTopLvlPtsSize    1
AvgAddrTakenVarPts  1
AvgINPtsSize        1
AvgOUTPtsSize       1
AverageSCCSize      0
TotalTime           0
PointsToConstPtr    0
PointsToBlkPtr      0
StrongUpdates       4
SNodesHaveIN        6
SNodesHaveOUT       4
FI_SNodesHaveIN     0
FI_SNodesHaveOUT    0
FO_SNodesHaveIN     0
FO_SNodesHaveOUT    0
AI_SNodesHaveIN     0
AI_SNodesHaveOUT    0
AO_SNodesHaveIN     2
AO_SNodesHaveOUT    0
LD_SNodesHaveIN     2
LD_SNodesHaveOUT    0
ST_SNodesHaveIN     2
ST_SNodesHaveOUT    4
PHI_SNodesHaveIN    0
PHI_SNodesHaveOUT   0
VarHaveIN           6
VarHaveOUT          4
VarHaveIN_FI        0
VarHaveOUT_FI       0
VarHaveIN_FO        0
VarHaveOUT_FO       0
VarHaveIN_AI        0
VarHaveOUT_AI       0
VarHaveIN_AO        2
VarHaveOUT_AO       0
VarHaveIN_LD        2
VarHaveOUT_LD       0
VarHaveIN_ST        2
VarHaveOUT_ST       4
VarHaveIN_PHI       0
VarHaveOUT_PHI      0
MaxPtsSize          1
MaxTopLvlPtsSize    1
MaxINPtsSize        1
MaxOUTPtsSize       1
NumOfAddrTakenVar   4
MaxAddrTakenVarPts  1
ProcessedAddr       6
ProcessedCopy       2
ProcessedGep        0
ProcessedLoad       4
ProcessedStore      8
ProcessedPhi        4
ProcessedAParam     0
ProcessedFRet       0
ProcessedMSSANode   4
NumOfNodesInSCC     0
MaxSCCSize          1
NumOfSCC            0
TotalPointers       18
TotalObjects        14
StoresNum           4
CopysNum            1
Pointers            18
DYFieldPtrs         0
MemObjects          8
DYFieldObjs         6
Iterations          1
IndEdgeSolved       0
NullPointer         0
#######################################################

10. Memory SSA

The memory SSA representation of a program is built by adding MU and CHI functions to the LLVM IR of the program.

wpa -ander -svfg -dump-mssa swap.opt

==========FUNCTION: swap==========
2V_1 = ENCHI(MR_2V_0) 	pts{22 }
4V_1 = ENCHI(MR_4V_0) 	pts{24 }
entry

LDMU(MR_2V_1) 	pts{22 }
  %0 = load i8** %p, align 8

LDMU(MR_4V_1) 	pts{24 }
  %1 = load i8** %q, align 8
  store i8* %1, i8** %p, align 8
2V_2 = STCHI(MR_2V_1) 	pts{22 }

  store i8* %0, i8** %q, align 8
4V_2 = STCHI(MR_4V_1) 	pts{24 }

  ret void
RETMU(MR_2V_2) 	pts{22 }
RETMU(MR_4V_2) 	pts{24 }

==========FUNCTION: main==========
2V_1 = ENCHI(MR_2V_0) 	pts{22 }
4V_1 = ENCHI(MR_4V_0) 	pts{24 }
entry
  %a1 = alloca i8, align 1
  %b1 = alloca i8, align 1
  %a = alloca i8*, align 8
  %b = alloca i8*, align 8
  store i8* %a1, i8** %a, align 8
2V_2 = STCHI(MR_2V_1) 	pts{22 }

  store i8* %b1, i8** %b, align 8
4V_2 = STCHI(MR_4V_1) 	pts{24 }

CALMU(MR_2V_2) 	pts{22 }
CALMU(MR_4V_2) 	pts{24 }
  call void @swap(i8** %a, i8** %b) CallSite:   call void @swap(i8** %a, i8** %b)
2V_3 = CALCHI(MR_2V_2) 	pts{22 }
4V_3 = CALCHI(MR_4V_2) 	pts{24 }

  ret i32 0
RETMU(MR_2V_3) 	pts{22 }
RETMU(MR_4V_3) 	pts{24 }

11. Value-Flow Graph

An inter-procedural sparse value-flow graph (SVFG) for a program is a directed graph that captures the def-use chains of both top-level pointers and address-taken objects.

A SVFG Node can be

(1) a statement (PAGEdge),

(2) a parameter or

(3) a memory region (a set of abstract objects)

A green rectangle stands for an address PAG edge (AddrPE), a red rectangle stands for a load PAG edge (LoadPE), and a blue rectangle stands for a store PAG edge (StorePE).

A yellow rectangle represents a parameter (e.g., SVFG NodeID 14, which is an actual parameter corresponding to "a" PAG NodeID 21) or a memory region at the entry/exit of a function, a callsite or a load/store. For example, SVFG NodeID 27 represents the memory object PAG NodeID 24 that is indirectly read inside the callee function "swap" via the callsite ID 1.

wpa -ander -svfg -dump-svfg swap.opt

svfg

A SVFG can be made more compact by eliminating unnecessary nodes such as ActualParm, AcutalIn and FormalRet, FormalOut using SVFG Optimizer.

svfg_opt