-
Notifications
You must be signed in to change notification settings - Fork 23
/
README
executable file
·126 lines (89 loc) · 5.36 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
Install package by navigating to the parent folder of this one and running
>R CMD INSTALL SNFtool
After the installation is complete you can use the functions. Here is an example session.
## First, set all the parameters:
K = 20; # number of neighbors, usually (10~30)
alpha = 0.5; # hyperparameter, usually (0.3~0.8)
T = 10; # Number of Iterations, usually (10~20)
## Data1 is of size n x d_1, where n is the number of patients, d_1 is the number of genes, e.g.
## Data2 is of size n x d_2, where n is the number of patients, d_2 is the number of methylation, e.g.
data(Data1)
data(Data2)
## Here, the simulation data (Data1, Data2) has two data types. They are complementary to each other. And two data types have the same number of points. The first half data belongs to the first cluster; the rest belongs to the second cluster.
truelabel = c(matrix(1,100,1),matrix(2,100,1)); ##the ground truth of the simulated data;
## Calculate distance matrices(here we calculate Euclidean Distance, you can use other distance, e.g,correlation)
## If the data are all continuous values, we recommend the users to perform standard normalization before using SNF, though it is optional depending on the data the users want to use.
# Data1 = standardNormalization(Data1);
# Data2 = standardNormalization(Data2);
## Calculate the pair-wise distance; If the data is continuous, we recommend to use the function "dist2" as follows; if the data is discrete, we recommend the users to use ""
Dist1 = dist2(as.matrix(Data1),as.matrix(Data1));
Dist2 = dist2(as.matrix(Data2),as.matrix(Data2));
## next, construct similarity graphs
W1 = affinityMatrix(Dist1, K, alpha)
W2 = affinityMatrix(Dist2, K, alpha)
## These similarity graphs have complementary information about clusters.
displayClusters(W1,truelabel);
displayClusters(W2,truelabel);
## next, we fuse all the graphs
## then the overall matrix can be computed by similarity network fusion(SNF):
W = SNF(list(W1,W2), K, T)
## With this unified graph W of size n x n, you can do either spectral clustering or Kernel NMF. If you need help with further clustering, please let us know.
## for example, spectral clustering
C = 2 # number of clusters
group = spectralClustering(W, C); # the final subtypes information
## you can evaluate the goodness of the obtained clustering results by calculate Normalized mutual information (NMI): if NMI is close to 1, it indicates that the obtained clustering is very close to the "true" cluster information; if NMI is close to 0, it indicates the obtained clustering is not similar to the "true" cluster information.
displayClusters(W, group);
SNFNMI = calNMI(group, truelabel)
## you can also find the concordance between each individual network and the fused network
ConcordanceMatrix = concordanceNetworkNMI(list(W, W1,W2));
################################################################################
# We also provide an example using label propagation to predict the labels of new data points below.
# How to use SNF with multiple views
# Load views into list "dataL"
# load("Digits.RData")
data(Digits)
# Set the other parameters
K = 20 # number of neighbours
alpha = 0.5 # hyperparameter in affinityMatrix
T = 20 # number of iterations of SNF
# Normalize the features in each of the views (optional)
# dataL = lapply(dataL, standardNormalization)
# Calculate the distances for each view
distL = lapply(dataL, function(x) dist2(x, x))
# Construct the similarity graphs
affinityL = lapply(distL, function(x) affinityMatrix(x, K, alpha))
################################################################################
# An example of how to use concordanceNetworkNMI
Concordance_matrix = concordanceNetworkNMI(affinityL, 3);
## The output, Concordance_matrix, shows the concordance between the fused network and each individual network.
################################################################################
# Example of how to use SNF to perform subtyping
# Construct the fused network
W = SNF(affinityL, K, T)
# perform clustering on the fused network.
clustering = spectralClustering(W,3);
# use NMI to measure the goodness of the obtained labels.
NMI = calNMI(clustering, label);
################################################################################
# Provide an example of predicting the new labels with label propagation
# Load views into list "dataL" and the cluster assignment into vector "label"
data(Digits)
# Create the training and test data
n = floor(0.8*length(label)) # number of training cases
trainSample = sample.int(length(label), n)
train = lapply(dataL, function(x) x[trainSample, ]) # Use the first 150 samples for training
test = lapply(dataL, function(x) x[-trainSample, ]) # Test the rest of the data set
groups = label[trainSample]
# Set the other
K = 20
alpha = 0.5
t = 20
method = TRUE
# Apply the prediction function to the data
newLabel = groupPredict(train,test,groups,K,alpha,t,method)
# Compare the prediction accuracy
accuracy = sum(label[-trainSample] == newLabel[-c(1:n)])/(length(label) - n)
################################################################################
# References:
# B Wang, A Mezlini, F Demir, M Fiume, T Zu, M Brudno, B Haibe-Kains, A Goldenberg (2014) Similarity Network Fusion: a fast and effective method to aggregate multiple data types on a genome wide scale. Nature Methods. Online. Jan 26, 2014
# Website: http://compbio.cs.toronto.edu/SNF/SNF/Software.html