-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.txt
169 lines (132 loc) · 5.59 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
AccNET V1.2 - Accessory Constellation Network.
Last update: 12/16/2016
Developed by: Val F. Lanza. (valfernandez.vf@gmail.com)
DESCRIPTION
AccNET is a comparative genomic tool for accessory genome analysis using
bipartite networks. The software has been designed to be compatible with
most of the Network Analysis software (i.e. Cytoscape, Gephi or R).
AccNET has been developed in Perl and it is designed for Linux
platforms. Please read the Dependencies secction for more details. The
software builds a bipartite network integrated by two kind of nodes
"Genomic Units (GU)" and "Homologous Proteins Cluster (HPC)". GU can be single
elements such chromosomes or plasmids, or complex set such as genomes,
pangenomes or even enviromental proteomes.
INPUT DATA
AccNET works with proteomes. Each proteome must be in a single file.
AccNET do not works with DNA data. A proteome can be a single element
such as Chromosome, plasmid, phage etc... or complex element (Genome
with a mix of chromosome and plasmids) but in any case, each element is
defined by its file.
OUTPUT DATA
-Network.csv: This is the network definition and include three
columns: "Source", "Target", "Weigth" and "Type".
-Table.csv: This file include all nodes attribute information.
-Representatives.faa FASTA file with representative AA sequence of
each cluster (HPC).
-Cluster.csv (Optional) Table with the node clusters (GU and HpC)
at different thresholds and methods.
please read the VISUALIZATION secction.
EXAMPLES
Accesory Network for genomes.
Simple: accnet.pl --in *.faa
Advance: accnet.pl --in *.faa --threshold 0.8
--kp '-s 1.5 -e 1e-8 -c 0.8'
--out Network_example.csv
--tblout Table_example.csv
--fast yes --clustering yes
Whole genomes. Only recommended for plasmids or inter-species comparisson.
accnet.pl --in *.faa --threshold 1.1
VISUALIZATION:
#Gephi visualization (https://gephi.org/).
-Open Gephi.
-Make a new Project. (File -> New Project)
-Import spreadsheet (File -> Import spreadsheet...)
-Select "Network.csv" as "Edges Table"
-Import spreadsheet (File -> Import spreadsheet...)
-Select "Table.csv" as "Nodes Table"
(Optional)
-Import spreadsheet (File -> Import spreadsheet...)
-Select "Cluster.csv" as "Nodes Table"
#Cytoscape visualization (http://www.cytoscape.org/)
-version 2.8.x
-Import Network file (File -> Import -> Network from Table)
-Select "Network.csv"
-Remove 1st line ("Show Text File Import Options"
-> "Transfer first line as attribute names")
-Select delimiter "Tab"
-Select 1st column as "Source Interaction"
-Select 2nd column as "Target Interaction"
-Check "Weight" column to import.
-Import.
-Import Node Attributes (File -> Import -> Attibutes from Table)
-Select "Table.csv" file
-Select delimiter "Tab"
-Import column headers ("Show Text File Import Options"
-> "Transfer first line as attribute names")
-Import
(Optional)
-Import Node Attributes (File -> Import -> Attibutes from Table)
-Select "Cluster.csv" file
-Select delimiter "Tab"
-Import column headers ("Show Text File Import Options"
-> "Transfer first line as attribute names")
-Import
-version 3.x
-Import Network file (File -> Import -> Network -> File)
-Select "Network.csv"
-Remove 1st line ("Show Text File Import Options"
->"Transfer first line as attribute names ")
-Select delimiter "Tab"
-Select 1st column as "Source Interaction"
-Select 2nd column as "Target Interaction"
-Check "Weight" column to import.
-Import.
-Import Node Attributes (File -> Import -> Table -> File)
-Select "Table.csv" file
-Select delimiter "Tab"
-Import column headers ("Show Text File Import Options"
->"Transfer first line as attribute names")
-Import
(Optional)
-Import Node Attributes (File -> Import -> Table -> File)
-Select "Cluster.csv" file
-Select delimiter "Tab"
-Import column headers ("Show Text File Import Options"
->"Transfer first line as attribute names")
-Import
NETWORK CLUSTERING
Since AccNET v1.2 Clustering network process has been added to the project.
Clustering network performs a clustering analysis that found both GU
and HpC clusters based on the network adjacent matrix. Clustering network process
are written in R language and requires the libraries dplyr, tidyr, cluster
and mclust. GU clusters are calculated by two methods: first with mclust
(Gaussian Mixture Modelling for Model-Based Clustering,Classification, and
Density Estimation) and second by hierarchical clustering. In HpC case,
the clusters are only calculated from hierarchical clustering method.
Both methods, hierarchical and bayesian use a distance matrix as input data.
This distance matrix are calculated using the distance binary method. In GU
case, the GU are taken as objects and HpC as variables and vice versa in HpC
case. For hierarchical clustering different heights are taken to create the
clusters. The cut points are calculated as the quantiles 75, 85, 90, 95 and
99 of tree heights. The resulting output file is a tab format file that
can be loaded in Gephi or Cytoscape.
Installing dependencies:
Open R and type:
install.packages(dplyr)
install.packages(tidyr)
install.packages(cluster)
install.packages(mclust)
DEPENDENCIES;
Since accnet 1.2:
- R software
- dplyr
- tidyr
- cluster
- mclust
- Perl packages dependencies:
-List::Util (Core-modules)
-Getopt::Long (Core-modules)
-Statistics::R (Installation:
sudo apt-get install libstatistics-r-perl
or
sudo yum install libstatistics-r-perl)