This repository contains the code of the master thesis of Tim Zeitz Engineering Distributed Graph Clustering using MapReduce and the follow up paper Distributed Graph Clustering using Modularity and Map Equation accepted to Euro-Par 2018.
The algorithms were implemented using the experimental Thrill framework, though we maintained our own fork of it.
Most of the updates from the fork have now been merged back into upstream, but some discrepencies remain.
The code uses C++14
, so you will need a recent gcc
or clang
to build.
We performed an extensive evaluation of this implementation. See the evaluation repository for jupyter notebooks, our raw result data, the graphs used etc.
- clone the git repository
- run
git submodule update --init --recursive
to fetch dependencies - create a folder for build output
mkdir build
andcd
into it - run
cmake ..
- this will create a makefile for a debug build - append-DCMAKE_BUILD_TYPE=Release
for a optimized build without asserts - run
make
to actually build everything - run one of the binaries, eg
./dlslm_map_eq somegraph.gr
Currently the CMakeLists.txt
file blindly passes -mavx2 -mfma
without checking if the system supports them.
If you get Illegal Instruction
errors when running the binaries remove these flags (or switch to -march=native
), they are not required and just make stuff faster.
The repo contains the following folders:
lib
- dependencies as git submodulessrc
- actual source codescripts
- scripted stuff to run our experiements, put moab jobs into a queue, postprocess experimental results for evaluation, etc.
When building everything (among others) the following binaries are build:
dlslm
- Distributed Synchronous Local Moving with Modularitydlslm_map_equation
- Distributed Synchronous Local Moving with Map equationpreprocess
- Translate an arbitrary input graph into our custom binary format and perform some fixes along the waystreaming_clustering_analyser
- a utility to calculate map equation and modularity scores of clusterings on very large graphsseq_louvain
- Graph Clustering using the original Louvain algorithminfomap
- a wrapper binary around infomap to make the interface compatible with the on ofseq_louvain
Our programs can read DIMACs graphs, SNAP Edge List graphs and our own custom binary format.
For optimal performance preprocess all graphs using the preprocess
tool.
seq_louvain
and infomap
will output a help about command line arguments, for the other binaries you will have to refer to the source code or the scripts to see what arguments can be passed.
The analysis scripts make use of Networkit which you can also use to generate some test graphs. To be able to read our binary graphs you will need to use the thrill_support branch of this fork of networkit.
This work was partially supported by the DFG under grants WA654/19-2 and WA654/22-2.
The code in this repository is released under the BSD-2 License - except for the parts which use Infomap which are released under GNU Affero General Public License version 3. For these files (src/infomap.cpp, src/infomap_directed.cpp), the licence is stated explicitely at the top of the file.