Skip to content

Latest commit

 

History

History
110 lines (80 loc) · 6.31 KB

0.2-loading.md

File metadata and controls

110 lines (80 loc) · 6.31 KB

LaganLighter Docs: Graph Types, Loading Graphs & Running Algorithms

Graph Types

LaganLighter supports the following graph formats:

  • CSR/CSC graph in textual format, for testing. This format has 4 lines:

    1. The number of vertices (|V|),
    2. The number of edges (|E|),
    3. |V| space-separated numbers showing offsets of the vertices, and
    4. |E| space-separated numbers indicating edges.
  • Compressed CSR/CSC graphs in WebGraph format are supported by integrating ParaGrapher as a submodule.

Running Algorithms

Environment variables may be passed to make to specify the input graph:

  • LL_INPUT_GRAPH_PATH: path to the graph. The default value is data/test_csr.txt

  • LL_INPUT_GRAPH_TYPE: type of the graph which can be

    • text,
    • PARAGRAPHER_CSX_WG_400_AP (graphs with 4 Bytes vertex IDs and no weights),
    • PARAGRAPHER_CSX_WG_404_AP (graphs with 4 Bytes vertex IDs and 4 Bytes uint edge weights), or
    • PARAGRAPHER_CSX_WG_800_AP (graphs with 8 Bytes vertex IDs and no weights).

    The default value is text.

    Please refer to ParaGrapher Documentation for more details about ParaGrapher types.

  • LL_INPUT_GRAPH_IS_SYMMETRIC: with a value of 0 or 1, specifies if the input graph is symmetric. The default value is 0. Using this variable for symmetric graphs will help algorithms the dataset to be passed directly (and without symmetrization) to algorithms that require a symmetric graph input.

  • LL_INPUT_GRAPH_BATCH_ORDER: is used by launcher.sh script to inform the program the batch order of current
    dataset. Default is 0.

  • LL_STORE_INPUT_GRAPH_IN_SHM: with a value of 0 or 1, specifies if it is required to store a copy of the graph as a shared memory object (i.e. in /dev/shm). The default value is 0. It is useful when the size of input graph(s) is large and some experiments are repeated multiple times on the graphs. In this case and by storing the graphs as shared memory objects, it is not required to load them from the storage.

  • LL_OUTPUT_REPORT_PATH: specifies the path to the report file, if it is required. It is used by launcher.sh script to aggregate results for all processed datasets. Default value is NULL.

To run a single algorithm, it is enough to call make alg..., e.g., make alg1_sapco_sort. It runs the algorithm for the default options (stated in the above). To run the algorithm for a particular graph, you may need to pass the above variables. E.g., LL_INPUT_GRAPH_PATH=data/cnr-2000 LL_INPUT_GRAPH_TYPE=PARAGRAPHER_CSX_WG_400_AP make alg1_sapco_sort calls the algorithm for the data/cnr-2000.

The following variables may be set when calling make:

  • threads=t sets the number of concurrent threads to t. Without this, threads number is set to to number of logical threads.
  • no_ht=1 disables hyperthreading when threads=t is absent.
  • debug=1 enables debugging (-g of gcc)
  • wait_passive=1 sets OMP_WAIT_POLICY to passive instead of its default value which is active,
  • energy=1 activates energy measurement.

E.g., make alg1_sapco_sort no_ht=1 wait_passive=1.

How Does LaganLighter Load a Graph?

For textual graphs (formatted above), it is required to call get_ll_400_txt_graph(). To load the graphs in WebGraph format (using ParaGrapher), functions get_ll_400_webgraph() and get_ll_404_webgraph() should be called. These 3 functions have been defined in graph.c and load the graph in the following steps:

  • Checking if the graph has been stored as a shared memory object. In that case the graph is returned to the user by calling get_shm_ll_400_graph() and get_shm_ll_404_graph(). This graph should be released by calling release_shm_ll_400_graph() or release_shm_ll_404_graph().

  • If the graph is not found in the shared memory, the graph is loaded/decompressed from the secondary storage and a NUMA-interleaved memory is allocated for the graph. This graph should be released by calling release_numa_interleaved_ll_400_graph() or release_numa_interleaved_ll_404_graph().

    Note: The graphs are loaded before calling initialize_omp_par_env() which initializes OpenMP threads (to prevent busy wait of the OpenMP threads initialized by OMP_WAIT_POLICY=active, by default). The get_ll_400_txt_graph() function reads textual graphs sequentially and get_ll_40X_webgraph() functions call the ParaGrapher library which uses the pthread for parallelization.

  • When the graph is loaded/decompressed from the secondary storage, the OS caches some contents of the graph in memory. This cached data by OS may impact the performance of algorithms especially when a large percentage of the memory is used. To prevent this, by the end of graph loading, the flush_os_cache() functions is used to run th e flushcache.sh script. If the user has sudo access, a script can be added for this purpose (Please refer to the comments on the first lines of flushcache.sh).

Downloading Public Graphs

You may find WebGraphs in:

Please also refer to https://blogs.qub.ac.uk/DIPSA/graphs-list-2024.

Executing an Algorithm For Multiple Graphs

Please refer to the Launcher Doc.