DiffKemp is a framework for automatic static analysis of semantic differences between different versions of projects written in C, with main focus on the Linux kernel.
The main use-case of DiffKemp is to compare selected functions and configuration options in two versions of a project and to report any discovered semantic differences.
Warning
DiffKemp is incomplete in its nature, hence may provide false positive results (i.e. claiming that functions are not equivalent even though they are). This especially happens with complex refactorings.
You can install DiffKemp:
-
By building it manually from source
-
From a prepared RPM package that can be installed from our Copr repository:
# Enabling the DiffKemp repository dnf install -y dnf-plugins-core dnf copr enable -y viktormalik/diffkemp # Installing DiffKemp dnf install -y diffkemp
---
# This code renders an image showing the relation of the commands.
# The image does not show on the GitHub app, use a browser to see the image.
config:
theme: base
themeCSS: "
/* redefinition of subgraphs labels */
.cluster-label {
font-family: monospace;
font-size: 16px;
}
"
themeVariables:
nodeBorder: black
mainBkg: white
clusterBkg: white
clusterBorder: white
edgeLabelBackground: lightgrey
---
flowchart LR
%% Wrapping the whole thing in a subgraph so it has a white background even
%% in a dark mode.
subgraph G[" "]
direction LR
%% nodes
pv(project versions)
sl(symbol list)
subgraph sgSubgraph[diffkemp build\ndiffkemp build-kernel\ndiffkemp llvm-to-snapshot]
sg(Snapshot\ngeneration)
end
subgraph scSubgraph[diffkemp compare]
sc(Semantic\ncomparison)
end
subgraph rvSubgraph[diffkemp view]
rv(Result viewer)
end
report[report for\nnot equal symbols]
neq(✗ not equal)
eq(✓ equal)
%% invisible node for making pv node more aligned with other nodes
tmp(" ")
%% edges
%% for nodes alignment
tmp ~~~ sg
pv -- old --> sg
pv -- new --> sg
sl --> sg
sg -- old snapshot --> sc
%% invisible edge to make old and new snapshot edges more symetric
sg ~~~ sc
sg -- new snapshot --> sc
sc --- report
report .-> rv
report --> neq
sc --> eq
%% styles for edges
%% style for `sc --- report` edge
linkStyle 7 stroke:darkred;
%% style for `report .-> rv` edge
linkStyle 8 stroke:darkred;
%% style for `report --> neq` edge
linkStyle 9 stroke:darkred;
%% style for `sc --> eq` edge
linkStyle 10 stroke:darkgreen;
%% node style definitions
classDef noBorderClass stroke:white;
classDef invisibleNodeClass display:none;
%% node style application
style report fill:lightgrey,stroke:lightgray;
style neq color:darkred;
style eq color:darkgreen;
class tmp invisibleNodeClass;
class pv,sl,eq,neq noBorderClass;
end
DiffKemp runs in three phases:
-
Snapshot generation takes symbols (functions) that you want to compare and project versions. It compiles the versions into LLVM IR (which it uses for the comparison) and creates so-called snapshots which contain the relevant LLVM IR files and additional metadata. (DiffKemp needs the analysed project to be compiled with debugging information in order to work properly.)
There are several options for snapshot generation:
-
is the default snapshot generation command for
diffkemp build PROJ_DIR SNAPSHOT_DIR [SYMBOL_LIST]
make
-based projects. -
is a command specialized for building snapshots from the Linux kernel.
diffkemp build-kernel KERNEL_DIR SNAPSHOT_DIR SYMBOL_LIST
-
can be used if the project is already compiled into a single LLVM IR file.
diffkemp llvm-to-snapshot PROJ_DIR LLVM_FILE SNAPSHOT_DIR SYMBOL_LIST
In any case, the command should be run twice, once for each of the compared versions.
-
-
Semantic comparison takes two snapshots, compares them for semantic equality, and saves a report about symbols that were compared as semantically different. It is invoked via:
diffkemp compare SNAPSHOT_DIR_1 SNAPSHOT_DIR_2 -o COMPARE_OUTPUT_DIR
-
Additionally, you can run result viewer to get a visualisation of the found differences:
diffkemp view COMPARE_OUTPUT_DIR
See the Usage reference to learn more about how to use individual commands.
If you want to learn how you can use DiffKemp, you can read the following examples:
- Simple program: Example of using DiffKemp on a very simple program (contains a few lines of code) to understand how to use DiffKemp, what it does, and what it gives us as a result.
- Library: Example of using DiffKemp
on a more complex program, specifically on the musl C library. We will learn
how to compare complex
make
-based projects and how to use the result viewer. - Linux kernel: Example of using DiffKemp on the Linux kernel.
- Maintaining semantic stability: There are functions whose behaviour should not change between versions.
- Efficiency in code review: DiffKemp can reduce the amount of code that you will need to review manually by eliminating changes that do not impact the behaviour of the program.
- Detection of unintended side effects: Comparing programs on a semantic level can detect subtle changes that might alter the program's behavior.
- Focus on critical functions: You can specify a list of functions which you want to compare.
- Support for large and complex C projects like Linux kernel, C standard libraries, cryptographic libraries, ...
The main focus of DiffKemp is high scalability, such that it can be applied to large-scale projects containing a lot of code. To achieve that, the analysed functions are first compiled into LLVM IR, then several code transformations are applied, and finally the comparison itself is performed.
Wherever possible, DiffKemp tries to compare instruction-by-instruction (at the level of LLVM IR) which is typically sufficient for most of the code. When not sufficient, DiffKemp tries to apply one of the built-in or user-supplied semantics-preserving patterns. If no pattern can be applied, the relevant diffs are reported to the user.
The list of code and non-code contributors to this project, in pseudo-random order:
- Viktor Malík
- Tomáš Glozar
- Tomáš Vojnar
- Petr Šilling
- Pavol Žáčik
- František Nečas
- Tomáš Kučma
- Lukáš Petr
- Tatiana Malecová
- Jakub Rozek
There is a number of publications and talks related to DiffKemp:
- ICST'21 paper
and talk:
V. Malík and T. Vojnar, "Automatically Checking Semantic Equivalence between Versions of Large-Scale C Projects," 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST), 2021, pp. 329-339. - DevConf.CZ'19 talk.
- NETYS'22
paper and
talk:
Malík, V., Šilling, P., Vojnar, T. (2022). Applying Custom Patterns in Semantic Equality Analysis. In: Koulali, MA., Mezini, M. (eds) Networked Systems. NETYS 2022.