LaganLighter supports the following graph formats:
-
CSR/CSC graph in textual format, for testing. This format has 4 lines:
- The number of vertices (|V|),
- The number of edges (|E|),
- |V| space-separated numbers showing offsets of the vertices, and
- |E| space-separated numbers indicating edges.
-
Compressed CSR/CSC graphs in WebGraph format are supported by integrating ParaGrapher as a submodule.
Environment variables may be passed to make
to specify the input graph:
-
LL_INPUT_GRAPH_PATH
: path to the graph. The default value isdata/test_csr.txt
-
LL_INPUT_GRAPH_TYPE
: type of the graph which can betext
,PARAGRAPHER_CSX_WG_400_AP
(graphs with 4 Bytes vertex IDs and no weights),PARAGRAPHER_CSX_WG_404_AP
(graphs with 4 Bytes vertex IDs and 4 Bytesuint
edge weights), orPARAGRAPHER_CSX_WG_800_AP
(graphs with 8 Bytes vertex IDs and no weights).
The default value is
text
.Please refer to ParaGrapher Documentation for more details about ParaGrapher types.
-
LL_INPUT_GRAPH_IS_SYMMETRIC
: with a value of0
or1
, specifies if the input graph is symmetric. The default value is 0. Using this variable for symmetric graphs will help algorithms the dataset to be passed directly (and without symmetrization) to algorithms that require a symmetric graph input. -
LL_INPUT_GRAPH_BATCH_ORDER
: is used bylauncher.sh
script to inform the program the batch order of current
dataset. Default is 0. -
LL_STORE_INPUT_GRAPH_IN_SHM
: with a value of0
or1
, specifies if it is required to store a copy of the graph as a shared memory object (i.e. in/dev/shm
). The default value is 0. It is useful when the size of input graph(s) is large and some experiments are repeated multiple times on the graphs. In this case and by storing the graphs as shared memory objects, it is not required to load them from the storage. -
LL_OUTPUT_REPORT_PATH
: specifies the path to the report file, if it is required. It is used bylauncher.sh
script to aggregate results for all processed datasets. Default value isNULL
.
To run a single algorithm, it is enough to call make alg...
, e.g., make alg1_sapco_sort
.
It runs the algorithm for the default options (stated in the above). To run the algorithm for a particular graph,
you may need to pass the above variables.
E.g., LL_INPUT_GRAPH_PATH=data/cnr-2000 LL_INPUT_GRAPH_TYPE=PARAGRAPHER_CSX_WG_400_AP make alg1_sapco_sort
calls the algorithm
for the data/cnr-2000
.
The following variables may be set when calling make
:
threads=t
sets the number of concurrent threads tot
. Without this, threads number is set to to number of logical threads.no_ht=1
disables hyperthreading whenthreads=t
is absent.debug=1
enables debugging (-g of gcc)wait_passive=1
setsOMP_WAIT_POLICY
topassive
instead of its default value which isactive
,energy=1
activates energy measurement.
E.g., make alg1_sapco_sort no_ht=1 wait_passive=1
.
For textual graphs (formatted above), it is required to call get_ll_400_txt_graph()
.
To load the graphs in WebGraph format (using ParaGrapher), functions get_ll_400_webgraph()
and get_ll_404_webgraph()
should be called. These 3 functions have been defined in graph.c
and
load the graph in the following steps:
-
Checking if the graph has been stored as a shared memory object. In that case the graph is returned to the user by calling
get_shm_ll_400_graph()
andget_shm_ll_404_graph()
. This graph should be released by callingrelease_shm_ll_400_graph()
orrelease_shm_ll_404_graph()
. -
If the graph is not found in the shared memory, the graph is loaded/decompressed from the secondary storage and a NUMA-interleaved memory is allocated for the graph. This graph should be released by calling
release_numa_interleaved_ll_400_graph()
orrelease_numa_interleaved_ll_404_graph()
.Note: The graphs are loaded before calling
initialize_omp_par_env()
which initializes OpenMP threads (to prevent busy wait of the OpenMP threads initialized byOMP_WAIT_POLICY=active
, by default). Theget_ll_400_txt_graph()
function reads textual graphs sequentially andget_ll_40X_webgraph()
functions call the ParaGrapher library which uses thepthread
for parallelization. -
When the graph is loaded/decompressed from the secondary storage, the OS caches some contents of the graph in memory. This cached data by OS may impact the performance of algorithms especially when a large percentage of the memory is used. To prevent this, by the end of graph loading, the
flush_os_cache()
functions is used to run th eflushcache.sh
script. If the user hassudo
access, a script can be added for this purpose (Please refer to the comments on the first lines offlushcache.sh
).
You may find WebGraphs in:
- https://law.di.unimi.it/datasets.php
- http://madata.bib.uni-mannheim.de/50/ [DOI: 10.7801/50]
- https://docs.softwareheritage.org/devel/swh-dataset/graph/dataset.htm
- https://blogs.qub.ac.uk/DIPSA/MS-BioGraphs [DOI: 10.21227/gmd9-1534]
Please also refer to https://blogs.qub.ac.uk/DIPSA/graphs-list-2024.
Please refer to the Launcher Doc.