-
Notifications
You must be signed in to change notification settings - Fork 35
/
README
102 lines (60 loc) · 3.04 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
References
----------
Prem K. Gopalan, David M. Blei. Efficient discovery of overlapping communities
in massive networks. To appear in the Proceedings of the National Academy of
Sciences, 2013 (published ahead of print August 15, 2013, doi:10.1073/pnas.1221839110).
Article: http://www.pnas.org/content/early/2013/08/14/1221839110.full.pdf
SI: http://www.pnas.org/content/early/2013/08/14/1221839110/suppl/DCSupplemental
Installation
------------
Required libraries: gsl, gslblas, pthread
On Linux/Unix run
./configure
make; make install
SVINET has not been tested on a Mac. Please run it on Linux.
The binary 'svinet' will be installed in /usr/local/bin unless a
different prefix is provided to configure. (See INSTALL.)
Tutorial
--------
1. Prepare your network data as a tab-separated file (e.g., network.txt)
2. Run the following command to find the overlapping communities:
svinet -file network.txt -n 17903 -k 20 -link-sampling
3. Run the following command to visualize the communities:
cd <output-dir>; svinet -file ../network.txt -n 10000 -k 75 -gml
In step 2, "-n" specifies the number of nodes, "-k" specifies the
number of communities and "-link-sampling" specifies the sampling
method.
Step 2 writes out the communities in communities.txt, the model fit in
gamma.txt and lambda.txt and the mixed-memberships in groups.txt.
Step 3 writes out a GML file (network.gml) that can be loaded into a
tool such as Gephi, to visualize the communities. Note that each node
is colored by its most likely community in the visualization.
Some advanced tips
------------------
1. *Estimating the number of communities*
Run the following command setting the number of communities equal
to the number of nodes:
svinet -file network.txt -n 10000 -k 10000 -findk
Estimate the number of communities using the following:
wc -l n10000-k10000-mmsb-findk-uniform/communities.txt
Specify this count as the number of communities in step 2 in the tutorial.
2. *Comparing communities to the ground truth*
If you have a text file with ground truth community labels, you can
specify it in step 2 above, to compute a normalized mutual
information score between the true communities and the inferred
communities. Run the command in step 2 as follows:
svinet -file network.txt -n 10000 -k 20 -link-sampling -nmi community.txt
The format of the ground truth community file is as follows:
node1 <list of communities node1 is a member of>
e.g.,
65 17 22 43 54
The above line says node with id 65 is a member of communities 17,
22, 43, 54. The community ids are arbitrary.
3. *Comparing communities to results from other methods*
The authors recommend running svinet with two settings of the
**link threshold** as follows:
svinet -file network.txt -n 10000 -k 20 -link-sampling -link-thresh 0.5
svinet -file network.txt -n 10000 -k 20 -link-sampling -link-thresh 0.9
For further details, see detailed_readme.txt or please email the authors.
4. *Other sampling methods*
See detailed_readme.txt.