-
Notifications
You must be signed in to change notification settings - Fork 435
Analyze a Simple C Program
We use an example to go through each component of SVF: (1) Memory Model including PAG and Constraint Graph, (2) Pointer Analysis including both flow-insensitive and flow-sensitive analyses, and (3) Value-Flow Construction.
void swap(char **p, char **q){
char* t = *p;
*p = *q;
*q = t;
}
int main(){
char a1, b1;
char *a = &a1;
char *b = &b1;
swap(&a,&b);
}
clang -c -Xclang -disable-O0-optnone -emit-llvm swap.c -o swap.bc
opt -mem2reg swap.bc -o swap.opt
define void @swap(i8** %p, i8** %q) #0 {
entry:
%0 = load i8** %p, align 8
%1 = load i8** %q, align 8
store i8* %1, i8** %p, align 8
store i8* %0, i8** %q, align 8
ret void
}
define i32 @main() #0 {
entry:
%a1 = alloca i8, align 1
%b1 = alloca i8, align 1
%a = alloca i8*, align 8
%b = alloca i8*, align 8
store i8* %a1, i8** %a, align 8
store i8* %b1, i8** %b, align 8
call void @swap(i8** %a, i8** %b)
ret i32 0
}
wpa -ander -dump-callgraph swap.opt
wpa -nander -dump-pag swap.opt
As shown in the above figure, A node in PAG denotes either a pointer or an abstract memory object. A node shown as a circle represents a pointer (ValPN) while a node shown as an octagon represents an abstract memory object (ObjPN).
An edge between two nodes represents a constraint: a green edge for a memory allocation (AddrPE), a blue edge for a store (StorePE), a red edge for a load (LoadPE), and a dotted black edge for parameter passing (CallPE).
wpa -nander -dump-pag -dump-consG swap.opt
A Constraint Graph is used when Andersen's flow-insensitive analysis is performned. The following rules are used in resolving the constraints in the program during such an inclusion-based pointer analysis:
During the analysis, new copy edges (solid black arrows) will be often added incrementally to the constraint graph until a fixed point has been reached. The final constraint graph is:
wpa -nander -print-pts swap.opt
NodeID 4 PointsTo: { 5 }
NodeID 7 PointsTo: { 22 }
NodeID 8 PointsTo: { 24 }
NodeID 9 PointsTo: { 18 20 }
NodeID 10 PointsTo: { 18 20 }
NodeID 14 PointsTo: { 15 }
NodeID 17 PointsTo: { 18 }
NodeID 19 PointsTo: { 20 }
NodeID 21 PointsTo: { 22 }
NodeID 23 PointsTo: { 24 }
wpa -ander -stat swap.opt
****Andersen Pointer Analysis Statistics****
################ (program : )###############
TotalPointers 18
TotalObjects 8
TotalFieldObjects 6
MaxStructSize 0
TotalEdges 30
FunctionObjs 2
GlobalObjs 0
HeapObjs 0
StackObjs 4
FIObjNum 0
FSObjNum 6
VarStructObj 0
VarArrayObj 0
ConstStructObj 0
ConstArrayObj 0
NonPtrObj 4
AddrsNum 6
LoadsNum 2
StoresNum 4
CopysNum 1
GepsNum 0
CallsNum 2
ReturnsNum 0
IndCallSites 0
LocalVarInRecur 0
BitCastNumber 0
BBWith2Succ 0
BBWith3Succ 0
-------------------------------------------------------
CollapseTime 0
TotalTime 0.001
SCCDetectTime 0.001
SCCMergeTime 0
LoadStoreTime 0
CopyGepTime 0
UpdateCGTime 0
AvgPtsSetSize 0.533333
AvgTopLvlPtsSize 1.2
CGNodeNum 27
PointsToConstPtr 0
PointsToBlkPtr 0
TotalPointers 18
TotalObjects 14
TotalEdges 11
AddrsNum 4
LoadsNum 2
StoresNum 4
CopysNum 5
GepsNum 0
AddrProcessed 6
LoadProcessed 2
StoreProcessed 4
CopyProcessed 8
GepProcessed 0
Pointers 18
DYFieldPtrs 0
MemObjects 8
DYFieldObjs 6
MaxPtsSetSize 2
Iterations 2
IndCallSites 0
IndEdgeSolved 0
NumOfSCCDetect 2
TotalCycleNum 1
TotalPWCCycleNum 0
NodesInCycles 4
MaxNodesInSCC 4
NullPointer 0
#######################################################
wpa -fspta -print-pts swap.opt
NodeID 4 PointsTo: { 5 }
NodeID 7 PointsTo: { 22 }
NodeID 8 PointsTo: { 24 }
NodeID 9 PointsTo: { 18 }
NodeID 10 PointsTo: { 20 }
NodeID 14 PointsTo: { 15 }
NodeID 17 PointsTo: { 18 }
NodeID 19 PointsTo: { 20 }
NodeID 21 PointsTo: { 22 }
NodeID 23 PointsTo: { 24 }
wpa -fspta -stat swap.opt
****Flow-Sensitive Pointer Analysis Statistics****
################ (program : )###############
TotalPointers 18
TotalObjects 8
TotalFieldObjects 6
MaxStructSize 0
TotalEdges 30
FunctionObjs 2
GlobalObjs 0
HeapObjs 0
StackObjs 4
FIObjNum 0
FSObjNum 6
VarStructObj 0
VarArrayObj 0
ConstStructObj 0
ConstArrayObj 0
NonPtrObj 4
AddrsNum 6
LoadsNum 2
StoresNum 4
CopysNum 1
GepsNum 0
CallsNum 2
ReturnsNum 0
IndCallSites 0
LocalVarInRecur 0
BitCastNumber 0
BBWith2Succ 0
BBWith3Succ 0
-------------------------------------------------------
SolveTime 0
SCCTime 0
ProcessTime 0
PropagationTime 0
DirectPropaTime 0
IndirectPropaTime 0
UpdateTime 0
AddrTime 0
CopyGepTime 0
LoadTime 0
StoreTime 0
UpdateCGTime 0
AvgPtsSize 1
AvgTopLvlPtsSize 1
AvgAddrTakenVarPts 1
AvgINPtsSize 1
AvgOUTPtsSize 1
AverageSCCSize 0
TotalTime 0
PointsToConstPtr 0
PointsToBlkPtr 0
StrongUpdates 4
SNodesHaveIN 6
SNodesHaveOUT 4
FI_SNodesHaveIN 0
FI_SNodesHaveOUT 0
FO_SNodesHaveIN 0
FO_SNodesHaveOUT 0
AI_SNodesHaveIN 0
AI_SNodesHaveOUT 0
AO_SNodesHaveIN 2
AO_SNodesHaveOUT 0
LD_SNodesHaveIN 2
LD_SNodesHaveOUT 0
ST_SNodesHaveIN 2
ST_SNodesHaveOUT 4
PHI_SNodesHaveIN 0
PHI_SNodesHaveOUT 0
VarHaveIN 6
VarHaveOUT 4
VarHaveIN_FI 0
VarHaveOUT_FI 0
VarHaveIN_FO 0
VarHaveOUT_FO 0
VarHaveIN_AI 0
VarHaveOUT_AI 0
VarHaveIN_AO 2
VarHaveOUT_AO 0
VarHaveIN_LD 2
VarHaveOUT_LD 0
VarHaveIN_ST 2
VarHaveOUT_ST 4
VarHaveIN_PHI 0
VarHaveOUT_PHI 0
MaxPtsSize 1
MaxTopLvlPtsSize 1
MaxINPtsSize 1
MaxOUTPtsSize 1
NumOfAddrTakenVar 4
MaxAddrTakenVarPts 1
ProcessedAddr 6
ProcessedCopy 2
ProcessedGep 0
ProcessedLoad 4
ProcessedStore 8
ProcessedPhi 4
ProcessedAParam 0
ProcessedFRet 0
ProcessedMSSANode 4
NumOfNodesInSCC 0
MaxSCCSize 1
NumOfSCC 0
TotalPointers 18
TotalObjects 14
StoresNum 4
CopysNum 1
Pointers 18
DYFieldPtrs 0
MemObjects 8
DYFieldObjs 6
Iterations 1
IndEdgeSolved 0
NullPointer 0
#######################################################
The memory SSA representation of a program is built by adding MU and CHI functions to the LLVM IR of the program.
wpa -ander -svfg -dump-mssa swap.opt
==========FUNCTION: swap==========
2V_1 = ENCHI(MR_2V_0) pts{22 }
4V_1 = ENCHI(MR_4V_0) pts{24 }
entry
LDMU(MR_2V_1) pts{22 }
%0 = load i8** %p, align 8
LDMU(MR_4V_1) pts{24 }
%1 = load i8** %q, align 8
store i8* %1, i8** %p, align 8
2V_2 = STCHI(MR_2V_1) pts{22 }
store i8* %0, i8** %q, align 8
4V_2 = STCHI(MR_4V_1) pts{24 }
ret void
RETMU(MR_2V_2) pts{22 }
RETMU(MR_4V_2) pts{24 }
==========FUNCTION: main==========
2V_1 = ENCHI(MR_2V_0) pts{22 }
4V_1 = ENCHI(MR_4V_0) pts{24 }
entry
%a1 = alloca i8, align 1
%b1 = alloca i8, align 1
%a = alloca i8*, align 8
%b = alloca i8*, align 8
store i8* %a1, i8** %a, align 8
2V_2 = STCHI(MR_2V_1) pts{22 }
store i8* %b1, i8** %b, align 8
4V_2 = STCHI(MR_4V_1) pts{24 }
CALMU(MR_2V_2) pts{22 }
CALMU(MR_4V_2) pts{24 }
call void @swap(i8** %a, i8** %b) CallSite: call void @swap(i8** %a, i8** %b)
2V_3 = CALCHI(MR_2V_2) pts{22 }
4V_3 = CALCHI(MR_4V_2) pts{24 }
ret i32 0
RETMU(MR_2V_3) pts{22 }
RETMU(MR_4V_3) pts{24 }
An inter-procedural sparse value-flow graph (SVFG) for a program is a directed graph that captures the def-use chains of both top-level pointers and address-taken objects.
A SVFG Node can be
(1) a statement (PAGEdge),
(2) a parameter or
(3) a memory region (a set of abstract objects)
A green rectangle stands for an address PAG edge (AddrPE), a red rectangle stands for a load PAG edge (LoadPE), and a blue rectangle stands for a store PAG edge (StorePE).
A yellow rectangle represents a parameter (e.g., SVFG NodeID 14, which is an actual parameter corresponding to "a" PAG NodeID 21) or a memory region at the entry/exit of a function, a callsite or a load/store. For example, SVFG NodeID 27 represents the memory object PAG NodeID 24 that is indirectly read inside the callee function "swap" via the callsite ID 1.
wpa -ander -svfg -dump-svfg swap.opt
A SVFG can be made more compact by eliminating unnecessary nodes such as ActualParm, AcutalIn and FormalRet, FormalOut using SVFG Optimizer.