Python implementation of MTCOV algorithm described in:
- [1] Contisciani M., Power E. & De Bacco C. (2020). Community detection with node attributes in multilayer networks, Scientific Reports 10, 15736 (2020).
This is a new generative algorithm that incorporates both the topology of interactions and node attributes to extract overlapping communities in directed and undirected multilayer networks.
If you use this code please cite [1].
The paper can be found here (Published version, open access) or here (preprint).
Copyright (c) 2020 Martina Contisciani and Caterina De Bacco.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
code
: Contains the Python implementation of MTCOV algorithm, the code for performing the cross-validation procedure and the code for generating synthetic data.data/input
: Contains an example of directed multilayer network, including the adjacency tensor and the design matrix. These are synthetic data.data/output
: Contains some results for testing the code.
The project has been developed using Python with the packages contained in requirements.txt. We suggest to create a conda environment with
conda create --name MTCOV python=3.9.16 --no-default-packages
, activate it with conda activate MTCOV
, and install all the dependencies by running (inside MTCOV
directory):
pip install -r requirements.txt
You can run tests to reproduce results contained in data/output
by running (inside code
directory):
python -m unittest test.py
python -m unittest test_cv.py
To test the program on the given example file, type:
cd code
python main.py
It will use the sample network contained in ./data/input
. The adjacency tensor adj.csv represents a directed, unweighted network with N=300 nodes and L=4 layers. The design matrix X.csv contains the attribute Metadata with Z=2 categories used in the analysis. The algorithm runs with C=2 communities and gamma=0.5.
- -f : Path of the input folder, (default='../data/input/').
- -j : Input file name of the adjacency tensor, (default='adj.csv').
- -c : Input file name of the design matrix, (default='X.csv').
- -o : Name of the source of the edge, (default='source').
- -r : Name of the target of the edge, (default='target').
- -x : Name of the column with node labels, (default='Name').
- -a : Name of the attribute to consider in the analysis, (default='Metadata').
- -C : Number of communities, (default=2).
- -g : Scaling parameter gamma, (default=0.5).
- -u : Flag to call the undirected network, (default=False).
- -d : Flag to force a dense transformation of the adjacency tensor, (default=False).
- -F : Flag to choose the convergence procedure, (default='log'). If 'log' the convergence is based on the loglikelihood values; if 'deltas' the convergence is based on the differences in the parameters values. The latter is suggested when the dataset is big (N > 1000 ca.).
- -b : Size of the batch used to compute the likelihood, (default=None).
The multilayer network (A) should be stored in a CSV file. An example of row is
node1 node2 0 0 1
where the first and second columns are the source and target nodes of the edge, respectively;
Note: if the network is undirected, you only need to input each edge once. You then need to specify to the algorithm that you are considering the undirected case by giving as a command line input parameter -u=True.
The design matrix (X) should be stored in a CSV file, where one column (Name) indicates the node labels.
The MTCOV returns a compressed file inside the data/output/test
folder. To load and print the out-going membership matrix:
theta = np.load('theta_test_GT.npz')
.
print(theta['u'])
theta contains the two