-
Notifications
You must be signed in to change notification settings - Fork 6
/
mfannot.1
198 lines (163 loc) · 8.6 KB
/
mfannot.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
MFANNOT(1) ORGANELLAR SEQUENCE ANNOTATION UTILITIES MFANNOT(1)
NAME
mfannot - automatically annotated a genomic sequence
SYNOPSIS
mfannot [options] masterfile
DESCRIPTION
Mfannot is a tool to automatically annotate genome data stored in
masterfile format. Mfannot uses a number of methods to annotate the given
genome. First it identifies ORFs using Flip. A value for a minimum size of
these ORFs is changeable. These ORFs are then blasted using blastp against
peplibraries and/or pepfiles (the default contain mitochondrial proteins).
Matches to the same protein which satify predetermined and user modifiable
cutoffs and are located within close proximity are grouped together. After
looking for a potential initiation codon, the genes are characterized.
Exonerate is subsequently used to predict a internal gene structure by
aligning the predicted gene and the reference protein (from the peplibrary).
Another part of the programme uses RNAweasel and HMMweasel to detect RNA genes.
The resulting annotation is stored into an outputfile in Masterfile format.
OPTIONS
--blast Blast e-value cutoff
This option allows the user to set the minimum signifiant e-value
for ORF threshold.
Default: 1e-10
-d, --debug
Use debugging mode. Print debugging information
--el, --emptyorflen empty ORF minimum length
This option allows the users to supply the cutoff value for an
ORF (in nuceotide) they must be multiple of 3, which is found
not to correspond to a gene.
It will instead appear in the Masterfile as ncorf (non
corresponding ORF).
Default: 300
--ext_config
It's a file used in order to call external programs.
Use for annotation of different genes like rnpB, rns and rnl.
Default: '/share/analysis/control/mfannot/external_programs.conf'.
--ext_select
A list of names used to select or unselect which external programs
to run (as specified in the --ext_config option above).
The different names should be separated by a comma as in 'rnpB,rns',
in this case ONLY the external programs needed to annotate 'rnpB' and
'rns' will be executed. If you want unselect an external programs you
must add no before the name of external programs : 'nornpB'; in this
case all programs run except those specified with a 'no' prefix.
Specifying a mixed set of 'no' and non-'no' programs has the same
effect as specifying only the programs with 'no'.
-f, --force
This option allows more than one copy of the same gene in
annotation.
-g, --genetic genetic code
Genetic code used.
---------------------------------
1 => Standard (default)
2 => Vertebrate Mitochondrial,
3 => Yeast Mitochondrial,
4 => Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate
Mitochondrial; Mycoplasma; Spiroplasma,
5 => Invertebrate Mitochondrial,
6 => Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear,
9 => Echinoderm Mitochondrial; Flatworm Mitochondrial,
10 => Euplotid Nuclear,
11 => Bacterial and Plant Plastid
12 => Alternative Yeast Nuclear
13 => Ascidian Mitochondrial
14 => Alternative Flatworm Mitochondrial
15 => Blepharisma Macronuclear
16 => Chlorophycean Mitochondrial
21 => Trematode Mitochondrial
22 => Scenedesmus Obliquus Mitochondrial
23 => Thraustochytrium Mitochondrial
---------------------------------
-h, --help
Display a help and usage, with syntax and options of the
programme.
-i, --insertion
This option allows the user to change the length of insertion in order
to report them.
-l, --logfile logfile
This option allows the user to log the state of mfannot into a user
chosen file.
--matrix This allows the user to choose which alignment matrix is used
during the blast.
Available matrix is this usually used by BLAST:
BLOSUM45, BLOSUM62 (default), BLOSUM 80.
PAM30, PAM70.
--maxis, --maxintronsize Maximum intron size
This option allows the user to modify the default maximum intron
size (in nucleotides). During annotation, ORFs are grouped
together to form a hypothetical protein. When this size is bigger,
the gap can not be considered as an intron and the multiple ORFs
form multiple hypothetical proteins.
Default: 3000
--mines, --minexonsize Minimum exon size
This option allows the user to modify the default minimum exon
size (in nucleotides) that mfannot uses during internal structure
prediction. This value should be an integer.
Default: 10
--minis, --minintronsize Minimum intron size
This option allows the user to modify the default minimum intron
size (in nucleotides) that mfannot uses during internal structure
prediction. This value should be an integer.
Default: 142
--minorflen minumum ORF length
This option allows the user to choose the size of the minumum ORFs
(in amino acids) that are produced using Flip. This value must be
an integer.
Default: 40
-o, --outputfile outputfile
This is name of the new masterfile to be created. The default
output file will use the append '.new' to the end of the input
file name. Should this file already exist, an integer will be
appended to the end of the file, incremented by 1 from the
largest number appended to existing file.
--oc, --overlapcut Overlapping cutoff
This value is a cutoff that represents the permitted overlapping
proportion of a non-corresponding ORF that overlaps a predicted
gene in nucleotide.
Default : 30
-p, --pepdirectory
Path to the peptide directory containing peptide files. By
default mfannot uses a collection of default peptide libraries.
Peptide files must contain the suffix '.pep'.
--part_introns
Default 1 : run RNASpinner in order to defined intron type
for all introns.
If value is 0 : run RNASpinner only for a part of introns.
--partial
Must be used when the genome his known to be partial or incomplete;
this will cause mfannot to only run a subset of all its built-in analysis.'
MORE
Masterfile:
The file given for input should be in masterfile format (ogmp web site)
however, a file in Fasta format will also suffice. It should be noted
that the output file will be in masterfile format. Mfannot is able to
work with multiple contigs in the same masterfile.
The input peplibraries or pepfiles:
The user can use several pepfiles and peplibraries. These should be at
fasta format, and the files should be named in the following format;
eg F.asco.Podospora.anserina.pep
| | | | |
| | | | |
Division Phylum Genus Species Pepfile Extension
By using peplibraries, files inside must end with the .pep extension.
Within the peptide file, the headers of each sequence must be named
appropriately, as follows;
'>(Genus species) (name_of_protein_without_space); comments'
If the protein name (in one piece) is not followed by ;, it will not be
recognized.
SEE ALSO
blastall(1), flip(1), formatdb(1), mf2xml(1)
Bioperl: http://www.bioperl.org/
Exonerate: http://www.ebi.ac.uk/~guy/exonerate/
RNAweasel: http://megasun.bch.umontreal.ca/RNAweasel/
Bioperl: http://www.bioperl.org/
pirobject: http://pirobject.sourceforge.net/
VERSION
1.1
AUTHORS
Natacha BECK,
Pierre RIOUX,
David TO,
Thomas HOELLINGER.
Franz LANG (Project Manager),