Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Leiden community detection algorithm docs #1014

Open
wants to merge 1 commit into
base: memgraph-2-21
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pages/advanced-algorithms/available-algorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ library](/advanced-algorithms/install-mage).
| [bipartite_matching](/advanced-algorithms/available-algorithms/bipartite_matching) | C++ | Algorithm for calculating maximum bipartite matching, where matching is a set of nodes chosen in such a way that no two edges share an endpoint. |
| [bridges](/advanced-algorithms/available-algorithms/bridges) | C++ | A bridge is an edge, which when deleted, increases the number of connected components. The goal of this algorithm is to detect edges that are bridges in a graph. |
| [community_detection](/advanced-algorithms/available-algorithms/community_detection) | C++ | The Louvain method for community detection is a greedy method for finding communities with maximum modularity in a graph. Runs in _O_(*n*log*n*) time. |
| [leiden_community_detection](/advanced-algorithms/available-algorithms/leiden_community_detection) | C++ | The Leiden method for community detection is an improvement on the Louvain method, designed to find communities with maximum modularity in a graph while addressing issues of disconnected communities. Runs in _O_(*L* *m*) time, where *L* is the number of iterations of the algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in alphabetical order

| [cycles](/advanced-algorithms/available-algorithms/cycles) | C++ | Algorithm for detecting cycles on graphs. |
| [degree_centrality](/advanced-algorithms/available-algorithms/degree_centrality) | C++ | The basic measurement of centrality that refers to the number of edges adjacent to a node. |
| [distance_calculator](/advanced-algorithms/available-algorithms/distance_calculator) | C++ | Module for finding the geographical distance between two points defined with 'lng' and 'lat' coordinates. |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
---
title: leiden_community_detection
description: Explore Memgraph's Leiden community detection capabilities and learn how to analyze the structure of complex networks. Access tutorials and comprehensive documentation to enhance your understanding of Leiden community detection algorithm.
---

import { Steps } from 'nextra/components'
import { Callout } from 'nextra/components'
import { Card, Cards } from 'nextra/components'
import GitHub from '/components/icons/GitHub'

# leiden_community_detection

Community in graphs mirrors real-world communities, like social circles. In a
graph, communities are sets of nodes. M. Girvan and M. E. J. Newman note that
nodes in a community connect more intensely with each other than with outside
nodes.

This module employs the [Leiden
algorithm](https://en.wikipedia.org/wiki/Leiden_algorithm) for community detection
based on paper [*From Louvain to Leiden: guaranteeing well-connected communities*](https://arxiv.org/abs/1810.08473).
The Leiden algorithm is a hierarchical clustering algorithm, that recursively merges communities into single nodes by greedily optimizing the modularity and the process repeats in the condensed graph.
It enhances the Louvain algorithm by addressing its limitations, particularly in situations where some identified communities are not well-connected.
This improvement is made by periodically subdividing communities into smaller, well-connected groups.
With an $\mathcal{O}(Lm)$ runtime for $m$ edges and $L$ number of iterations, it suits large graphs.

<Cards>
<Card
icon={<GitHub />}
title="Source code"
href="https://github.com/memgraph/mage/blob/main/cpp/leiden_community_detection_module/leiden_community_detection_module.cpp"
/>
</Cards>

| Trait | Value |
| ------------------------ | --------------------- |
| **Module type** | algorithm |
| **Implementation** | C++ |
| **Graph direction** | undirected |
| **Relationship weights** | weighted / unweighted |
| **Parallelism** | parallel |

## Procedures

<Callout type="info">
You can execute this algorithm on [graph projections, subgraphs or portions of the graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph).
</Callout>

### `get()`

Computes graph communities using the Leiden algorithm.

{<h4> Input: </h4>}

- `subgraph: Graph` (**OPTIONAL**) ➡ A specific subgraph, which is an [object of type Graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph) returned by the `project()` function, on which the algorithm is run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the algorithm behave when subgraph is NOT provided?

- `weight: string (default=null)` ➡ Specifies the default relationship weight. If not set,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a string and not a float? Does it refer to a property name?

the algorithm uses the `weight` relationship attribute when present and otherwise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relationship implies "edge". Is that what you meant?

treats the graph as unweighted.
- `gamma: double (default=1.0)` ➡ Resolution parameter used when computing the modularity. Internally the value is divided by the number of relationships for an unweighted graph, or the sum of weights of all relationships otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is gamma divided by some kind of weight? Please explain.

- `theta: double (default=0.01)` ➡ Controls the randomness while breaking a community into smaller ones.
- `resolution_parameter: double (default=0.01)` ➡ Minimum change in modularity that must be achieved when merging nodes within the same community.
- `max_iterations: int (default=inf)` ➡ Maximum number of iterations the algorithm will perform. If set to infinity, the algorithm will run until convergence is reached.

{<h4> Output: </h4>}

- `node: Vertex` ➡ Graph node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we give this a more descriptive label? What is special about this single node that is returned? Is it some kind of centroid for the community?

- `community_id: integer` ➡ Community ID. Defaults to $-1$ if the node does not belong to any community.
- `communities: list` ➡ List of intermediate communities that a node has been part of across iterations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this indicate the community hierarchy?


{<h4> Usage: </h4>}

Use the following query to detect communities:

```cypher
CALL leiden_community_detection.get()
YIELD node, community_id, communities;
```

### `get_subgraph()`

Computes graph communities over a subgraph using the Louvain method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Louvain on the Leiden page? I'm confused.


{<h4> Input: </h4>}

- `subgraph: Graph` (**OPTIONAL**) ➡ A specific subgraph, which is an [object of type Graph](/advanced-algorithms/run-algorithms#run-procedures-on-subgraph) returned by the `project()` function, on which the algorithm is run.
- `subgraph_nodes: List[Node]` ➡ List of nodes in the subgraph.
- `subgraph_relationships: List[Relationship]` ➡ List of relationships in the subgraph.
- `weight: string (default=null)` ➡ Specifies the default relationship weight. If not set,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does leiden require the --storage-properties-on-edges=true configuration?

the algorithm uses the `weight` relationship attribute when present and otherwise
treats the graph as unweighted.
- `gamma: double (default=1.0)` ➡ Resolution parameter used when computing the modularity. Internally the value is divided by the number of relationships for an unweighted graph, or the sum of weights of all relationships otherwise.
- `theta: double (default=0.01)` ➡ Controls the randomness while breaking a community into smaller ones.
- `resolution_parameter: double (default=0.01)` ➡ Minimum change in modularity that must be achieved when merging nodes within the same community.
- `max_iterations: int (default=inf)` ➡ Maximum number of iterations the algorithm will perform. If set to infinity, the algorithm will run until convergence is reached.

{<h4> Output: </h4>}

- `node: Vertex` ➡ Graph node.
- `community_id: int` ➡ Community ID. Defaults to $-1$ if the node does not belong to any community.
- `communities: list` ➡ List of intermediate communities that a node has been part of across iterations.

{<h4> Usage: </h4>}

Use the following query to compute communities in a subgraph:

```cypher
MATCH (a)-[e]-(b)
WITH COLLECT(a) AS nodes, COLLECT (e) AS relationships
CALL leiden_community_detection.get_subgraph(nodes, relationships)
YIELD node, community_id, communities;
```

## Example

<Steps>

{<h3> Database state </h3>}

The database contains the following data:

![](/pages/advanced-algorithms/available-algorithms/community_detection/community-detection-1.png)

Created with the following Cypher queries:

```cypher
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b);
```

{<h3> Detect communities </h3>}

Get communities using the following query:

```cypher
CALL leiden_community_detection.get()
YIELD node, community_id, communities
RETURN node.id AS node_id, community_id, communities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to see an example where a node is a member of more than one hierarchical community

ORDER BY node_id;
```

Results show which nodes belong to community 1, and which to community 2:

```plaintext
+--------------+--------------+--------------+
| node_id | community_id | communities |
+--------------+--------------+--------------+
| 0 | 0 | [0] |
| 1 | 0 | [0] |
| 2 | 0 | [0] |
| 3 | 1 | [1] |
| 4 | 1 | [1] |
| 5 | 1 | [1] |
+--------------+--------------+--------------+
```

</Steps>