pharokka.py
v1.4.0 allows you to input a custom HMM profile database using the --custom_hmm
parameter.
The easiest way to do this is with the create_custom_hmm.py
script that comes with pharokka
as follows:
create_custom_hmm.py -i <input directory of MSAs> -o <outdir> -p <prefix>
The resulting output you need to feed into pharokka.py
with the --custom_hmm
parameter will be outdir/prefix.h3m
create_custom_hmm.py
creates HMM profiles for each file in the input directory. It takes the name as the custom annotation for that HMM. It requires FASTA format MSAs (multiple sequence alignments) e.g.
>gene1
--------------------MERCMYDLSHFSFMTGRIGQLQTLTCIPVVSGDSFSVNFQGVFRLSPLRRNMVIDCMVDLFAFYVPYRHVYG-----DDWENFMLQGTDETVTFGTAD-FTTN-AVEYLGSQ-----R-TTGVAPLWRLASYNQIWNRYFRVPSDTSAILSNTALE---A-GVNKNKWGRYCARPKVIWSTGVDTTTD--AAD--REVTVTANQLDITDLDAVKGRYASEQRRDWFG-QRYNDILKNQFGGGAGTDADERPTLIMRKQHFLSGYDVDGTGDATLGQYSGKSAAVCDLQIPRKWFPEHGSL-WIMALLRFPTIHEEEIHYLFKKSQPTYDEISGDPDIWERKAPIEHQVQDFFTESTDTTSLGFMPYGQWYRYHPNVVHNQFTEVTGFSFVSA-RPT-----SKDTARYIQDDEYTPTFQTTSLGHWQSQARVDVEAVRPIPGPLKSIFAGV------------
>gene2
MSKSSGYATTSKPQEVVTRDIERKPYDLSHWSFKTGHIGRLQTFSVIPVVAGDSIELNMSAVLRLSPLRHFMYLDAVVDLFAFYVPHRHVYG-----SNWTAFLKAGVDEGSTLGTTT-FTGSNRPECLGVY-----FAAGQVLPSWSVHPYTMIWNNYFRDPQVDADAKTDATYLIGLTYPDPILKYGLPVCHLKRSWNTGVASGLS--TAD--YSLALSGGEVDLYQMSQLKGRLQTEQARDWFAFARYRDILKYTWDSEVNIDADQRPELIMRSTAWLSGKDVDGTDDATLGSYTGKSTGIFNLSFPSRFFAEHGSI-WIMGVVRFPPIVEAEAGYLNSKSEPTYKEISGDPSIIKHEPPISLNANEQFQG-ASSVDLGKIPYGQWYRESINMLSADYLEVSGHPFLLKSSIT-----SRATGVYVVPSQYDTVFSDTLMGHWNSQAFVDVRAKRFIPAPEDSIFAGTI-----------
>gene3
----------------MKGTDTRCLYDLDHWSHVSGNIGSLQTLSAIPVVAGDSMELNFTSLFRLSPLRRNLYLDAMVDLFAFYVPYRHVYG-----DTWIDFIKEGYDESQTLGTYT-IAAGEFINCTGAY-----LESQAVIPKWSIASYTRIWNRYFRHPTDSNEKDDDDLI----TASSGNQIYGYNCCHMKNIWSTGVDSELT--DDD--HKVDVTASKVDLLEIIQKKARFKTERQREWFG-QRYTDILDSVWGSNVNIDADERPELIMRTSTWLSGYDVDGTTETNIGTFSGKAFTTARLQFPMKYFNEHGTI-YIVALVRFPTIHAFERHYLFGKSEPTYKEIAGDPNVIRNEPPQAINPTDYIEN-ATATDVGLQPYAQWYRTHPSYTSLGFSNLDGHPFLQK-IIA-----NTDDAVYVDSNDYDEIFQTQQLQQWQSQGYVGLKAKRNIPDPRKSIFAGVK-----------
For example, if my input directory was called my_tail_msa
and contained 3 MSA files names tail_family_1.fasta
, tail_family_2.fasta
and tail_family_3.fasta
, the following command will create HMM profiles for all 3 MSA files, and combine them together into a h3m
file with the prefix tail
:
create_custom_hmm.py -i my_tail_msa -o tail_hmms -p tail
The resulting file to be used with --custom_hmm
is tail_hmms/tail.h3m
So on a genome, you would run pharokka
as follows:
pharokka.py -i <input fasta file> -o <output folder> -d <path/to/database_dir> -t <threads> --custom_hmm tail_hmms/tail.h3m
A gene hit to e.g. the tail_family_2 HMM profile found by PyHMMER will be indicated in the custom_hmm_id
column of the pharokka_cds_final_merged_output.tsv
output file as tail_family_2
, with information about the custom_hmm_bitscore
custom_hmm_evalue
for bitscore and evalue from PyHMMER. The annotations will supplement any PHROG annotation found and will be included in the pharokka.gff
and pharokka.gbk
files as 'custom_annotation=tail_family_2' in this example.
usage: create_custom_hmm.py [-h] [-i INDIR] [-o OUTDIR] [-p PREFIX] [-f] [-V]
create_custom_hmm.py: Creates HMMs from FASTA formatted MSAs with PyHMMER for use with Pharokka v1.4.0 and higher.
options:
-h, --help show this help message and exit
-i INDIR, --indir INDIR
Input directory containing FASTA formatted Multiple Sequence Alignments.
-o OUTDIR, --outdir OUTDIR
Output directory to store HMM profiles.
-p PREFIX, --prefix PREFIX
Prefix used to name HMMs. The relevant file be 'prefix'.h3m
-f, --force Overwrites the output directory.
-V, --version Print pharokka Version