Skip to content

Latest commit

 

History

History
328 lines (233 loc) · 13.5 KB

README.md

File metadata and controls

328 lines (233 loc) · 13.5 KB

MultiVERSE

Embedding of Monoplex, Multiplex, Heterogeneous, Multiplex-Heterogeneous and full Multiplex-Heterogeneous Networks.

You can find in this repository the necessary files to use MultiVERSE for multiplex end multiplex-heterogeneous network embedding. You can find the corresponding article on arxiv here: http://arxiv.org/abs/2008.10085

In order to use MultiVERSE, you need the networks to be in extended edgelist format:

        edge_type source target weight
          r1         n1    n2    1
          r2         n2    n3    1        

Requirements

Python 3:

  • rpy2
  • gensim (fast_version enabled)
  • networkx=2.2
  • numba=0.50.1
  • scikit-learn=0.21.3 (for evaluation)
  • pandas

R>=3.6.1:

  • devtools
  • igraph
  • mclust
  • kernlab
  • R.matlab
  • bc3net
  • optparse
  • tidyverse

A good way to set up the appropriate environement is to create one with anaconda:

conda create --name LP rpy2 gensim r numba spyder networkx=2.2 pandas scikit-learn=0.21.3 joblib matplotlib

Folders

  • Dataset: You will find here the datasets of multiplex and multiplex-heterogeneous networks.
  • evalne: This is the folder ot the evalne toolbox used for evaluation of the embeddings (https://evalne.readthedocs.io/en/latest/index.html) with the add of the cosine embedding operator.
  • Generated_graphs: The different types of graphs generated by MultiVERSE are stored in this folder.
  • RWR: You will find here the R files of RWR-M(H) (https://www.ncbi.nlm.nih.gov/pubmed/30020411)
  • ResultsMultiVERSE: MultiVERSE will store its outputs: embeddings (a dictionary with index as keys and embeddings of the corresponding node as value) and results of evaluation.
  • ResultsRWR: The matrix of similarity generated by RWR-M(H) will be stored in this folder.

MultiVERSE on multiplex network:

MultiVERSE_M.py

This program allows to apply MultiVERSE on a multiplex network.

The usage is the following:

python3 MultiVERSE_M.py [options]

      Options:
             -k NUMERIC
               Value of the position of the networks in the list 'Test_Networks'

To use the example, you can write in a terminal the following command:

python3 MultiVERSE_M.py -m ./Dataset/Multiplex/CKM-Physicians-Innovation_multiplex.edges

The output of this command is the embedding 'embedding_M.npy' in the directory ResultsMultiVERSE. The embedding is a dictionary with the index as key and the corresponding embedding as value.

Eval_MultiVERSE_M.py

This program allows to apply link prediction as evaluation of the multiplex embedding.

The usage is the following:

python3 Eval_MultiVERSE_M.py [options]

      Options:
             -m CHARACTER
               Path to the multiplex

To use the example, you can write in a terminal the following command:

python3 Eval_MultiVERSE_M.py -m ./Dataset/Multiplex/CKM-Physicians-Innovation_multiplex.edges

The output of this command is the embedding 'embedding_M_eval.npy' in the directory ResultsMultiVERSE. The embedding is a dictionary with the index as key and the corresponding embedding as value. The command will also generate the results of the evaluation in a .txt file 'Result_Linkpred_Multiplex_Test_Eval.txt'.

MultiVERSE on multiplex-heterogeneous network:

MultiVERSE_MH.py

This program allows to apply MultiVERSE on a multiplex-heterogeneous network.

The usage is the following:

python3 MultiVERSE_MH.py [options]

      Options:

  		 -n CHARACTER
      Path to the first multiplex network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   	-m CHARACTER
      Path to the second multiplex network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   	-b CHARACTER
      Path to the bipartite network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight. Source Nodes
      should be the ones from the first multiplex and target nodes
      from the second.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

To use the example, you can write in a terminal the following command:

python3 MultiVERSE_MH.py -m ./Dataset/Multiplex_Het/Multiplex_1.txt -n ./Dataset/Multiplex_Het/Multiplex_2.txt -b ./Dataset/Multiplex_Het/bipartite.txt

If you don't have enough memory, you can use the toy example by changing the parameters in MultiVERSE_MH.py and by using the following command:

python3 MultiVERSE_MH.py -n ./Dataset/Multiplex_Het/M1_toy.txt -m ./Dataset/Multiplex_Het/M2_toy.txt -b ./Dataset/Multiplex_Het/bipartite_toy.txt

The output of this command is the embedding 'embedding_MH.npy' in the directory ResultsMultiVERSE. The embedding is a dictionary with the index as key and the corresponding embedding as value.

Eval_MultiVERSE_MH.py

This program allows to apply link prediction as evaluation of the multiplex-heterogeneous embedding.

The usage is the following:

python3 Eval_MultiVERSE_MH.py [options]

      Options:

  		 -n CHARACTER
      Path to the first multiplex network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   	-m CHARACTER
      Path to the second multiplex network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   	-b CHARACTER
      Path to the bipartite network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight. Source Nodes
      should be the ones from the first multiplex and target nodes
      from the second.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

To use the example, you can write in a terminal the following command:

python3 Eval_MultiVERSE_MH.py -m ./Dataset/Multiplex_Het/Multiplex_1.txt -n ./Dataset/Multiplex_Het/Multiplex_2.txt -b ./Dataset/Multiplex_Het/bipartite.txt

If you don't have enough memory, you can use the toy example by changing the parameters in Eval_MultiVERSE_MH.py and by using the following command:

python3 Eval_MultiVERSE_MH.py -n ./Dataset/Multiplex_Het/M1_toy.txt -m ./Dataset/Multiplex_Het/M2_toy.txt -b ./Dataset/Multiplex_Het/bipartite_toy.txt

The output of this command is the embedding 'embedding_MH_eval.npy' in the directory ResultsMultiVERSE. The embedding is a dictionary with the index as key and the corresponding embedding as value. The command will also generate the results of the evaluation in a .txt file 'Result_LinkpredMultiplexHet_Test_Eval.txt'.

Usage of the RWR files:

In the RWR folder, you will find:

  • GenerateSimMatrix.R: Script that computes RWR scores taking as a seed every individual node of the input network. These scores are used to build a N x N matrix,where N is the number of nodes of the input network. The goal is to apply network embedding methods on this matrix. To apply for monoplex and multiplex networks. For additional information about RWR on multiplex network, we refer to: https://www.ncbi.nlm.nih.gov/pubmed/30020411
  • GenerateSimMatrix_MH.R: This script is similar to the previous one but it can applied to a heterogeneous, a multiplex-heterogeneous or a full multiplex-heterogeneous network. The RWR scores are used to build a (N+M) x (N+M) matrix, where N and M are the number of nodes of the first and second network respectively. The goal is to apply network embedding methods on this matrix. For additional information about RWR on heterogeneous and multiplex-heterogeneous network, we refer to: https://www.ncbi.nlm.nih.gov/pubmed/30020411
  • EdgelistToMultiplex.R: Script that takes as input monoplex networks on edgelist format and transforms then to the multiplex format required as input in the GenerateSimMatrix.R script.
  • Functions_RWRMH.R: An R file containing the functions to perform RWR on multiplex networks used by the GenerateSimMatrix.R script.
  • Networks: A folder contaning some input networks in edgelist format and others with the format required as input for the GenerateSimMatrix.R script.

1.- GenerateSimMatrix.R

Type on the command line the following command:

Rscript GenerateSimMatrix.R [options]

Options:
   -n CHARACTER, --network=CHARACTER
      Path to the multiplex network to be used as Input. It should be a  
      space separated four column file containing the fields:
      edge_type, source, target and weight:

      edge_type source target weight
         r1         n1    n2    1
         r2         n2    n3    1

   -r NUMERIC, --restart=NUMERIC
     Value of the restart parameter ranging from 0 to 1. [default= 0.7]

   -o CHARACTER, --out=CHARACTER
       Name for the output file (The resulting Similarity Matrix saved as an
       rds file. rds extension will be attached to the name provided here).

   -c INTEGER, --cores=INTEGER
   Number of cores to be used for the Random Walk Calculation. [default= 1]

   -h, --help
       Show this help message and exit

Example: We compute the similarity matrix of the multiplex network generated in the example described in section 2 (below). In this case, we used 4 cores:

Rscript GenerateSimMatrix.R -n ./Networks/ppi_path_RWRMHvignette.txt -o ./Results/SimMatrixTest -c 4 -r 0.5

2.- GenerateSimMatrix_MH.R

Type on the command line the following command:

Rscript GenerateSimMatrix_MH.R [options]

Options:
   -n CHARACTER, --network1=CHARACTER
      Path to the first multiplex network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   -m CHARACTER, --network2=CHARACTER
      Path to the second multiplex network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   -b CHARACTER, --bipartite=CHARACTER
      Path to the bipartite network to be used as Input.
      It should be a space separated four column file containing
      the fields: edge_type, source, target and weight. Source Nodes
      should be the ones from the first multiplex and target nodes
      from the second.
        edge_type source target weight
         r1        n1    n2    1
         r2        n2    n3    1

   -r NUMERIC, --restart=NUMERIC
     Value of the restart parameter ranging from 0 to 1. [default= 0.7]

   -o CHARACTER, --out=CHARACTER
     Name for the output file (The resulting Similarity Matrix saved as an
     rds file. rds extension will be attached to the name provided here).

   -c INTEGER, --cores=INTEGER
   Number of cores to be used for the Random Walk Calculation. [default= 1]

   -h, --help
       Show this help message and exit

Example: We compute the similarity matrix of the full-multiplex heterogeneous toy network displayed in the figure. The input network files can be found in the networks directory.

Rscript GenerateSimMatrix_MH.R -n Networks/m1_toy.txt -m Networks/m2_toyMulti.txt -b Networks/bipartite_toy.txt -o example

3.- EdgelistToMultiplex.R

Type on the command line the following command:

Rscript EdgelistToMultiplex.R [options]

Options:
    -n CHARACTER, --networks=CHARACTER
        Path to the monoplex networks to be used as Inputs. They should be
        in edgelist format, with two or three column depending if edges are
        weighted : source node, target node, weight (optional). If more than
        one network is provided, they should be separated by commas:

     source target weight
         n1    n2    1
         n2    n3    1

   -e CHARACTER, --edgeTypes=CHARACTER
       Names for the different layers (Usually the type of edge described by
       the interactions). If more than one network is provided, they should
       be separated by commas.

   -o CHARACTER, --out=CHARACTER
       Name for the output file (The resulting multiplex network)

   -h, --help
       Show this help message and exit

Example: Use PPI and Pathway network from the RandomWalkRestartMH package (http://bioconductor.org/packages/release/bioc/html/RandomWalkRestartMH.html) to generate a 2-layers multiplexnetwork output file

Rscript EdgelistToMultiplex.R -n ./Networks/PPI_RWR_Vignette.txt,./Networks/Pathway_RWR_Vignette.txt -o ./Networks/ppi_path_RWRMHvignette.txt -e ppi,path