Skip to content

Commit

Permalink
Merge pull request #321 from gbouras13/dev
Browse files Browse the repository at this point in the history
v1.6.0
  • Loading branch information
gbouras13 authored Jan 10, 2024
2 parents 368c83c + 223ee0c commit 2db12bf
Show file tree
Hide file tree
Showing 22 changed files with 10,332 additions and 182 deletions.
7 changes: 7 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,13 @@
History
=======

1.6.0 (2024-01-11)
------------------

* Fixes a variety of bugs (#300 `pharokka_proteins.py` crashing if it found VFDB hits, #303 errors in the `.tbl` format, #316 errors with types and where custom HMM dbs had identical scored hits, #317 types and #320 deprecated GC function)
* Adds `--mash_distance` and `--minced_args` as parameters (#299 thanks @iferres).


1.5.1 (2023-10-26)
------------------

Expand Down
17 changes: 14 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ If you are looking for rapid standardised annotation of bacterial genomes, pleas
- [Paper](#paper)
- [Pharokka with Galaxy Europe Webserver](#pharokka-with-galaxy-europe-webserver)
- [Brief Overview](#brief-overview)
- [Pharokka v 1.6.0 Update (11 January 2024)](#pharokka-v-160-update-11-january-2024)
- [Pharokka v 1.5.0 Update (20 September 2023)](#pharokka-v-150-update-20-september-2023)
- [Pharokka v 1.4.0 Update (27 August 2023)](#pharokka-v-140-update-27-august-2023)
- [Pharokka v 1.3.0 Update](#pharokka-v-130-update)
Expand Down Expand Up @@ -96,6 +97,11 @@ So if you can't get `pharokka` to install on your machine for whatever reason or

`pharokka` uses [PHANOTATE](https://github.com/deprekate/PHANOTATE), the only gene prediction program tailored to bacteriophages, as the default program for gene prediction. [Prodigal](https://github.com/hyattpd/Prodigal) implemented with [pyrodigal](https://github.com/althonos/pyrodigal) and [Prodigal-gv](https://github.com/apcamargo/prodigal-gv) implemented with [pyrodigal-gv](https://github.com/althonos/pyrodigal-gv) are also available as alternatives. Following this, functional annotations are assigned by matching each predicted coding sequence (CDS) to the [PHROGs](https://phrogs.lmge.uca.fr), [CARD](https://card.mcmaster.ca) and [VFDB](http://www.mgc.ac.cn/VFs/main.htm) databases using [MMseqs2](https://github.com/soedinglab/MMseqs2). As of v1.4.0, `pharokka` will also match each CDS to the PHROGs database using more sensitive Hidden Markov Models using [PyHMMER](https://github.com/althonos/pyhmmer). Pharokka's main output is a GFF file suitable for using in downstream pangenomic pipelines like [Roary](https://sanger-pathogens.github.io/Roary/). `pharokka` also generates a `cds_functions.tsv` file, which includes counts of CDSs, tRNAs, tmRNAs, CRISPRs and functions assigned to CDSs according to the PHROGs database. See the full [usage](#usage) and check out the full [documentation](https://pharokka.readthedocs.io) for more details.

## Pharokka v 1.6.0 Update (11 January 2024)

* Fixes a variety of bugs (#300 `pharokka_proteins.py` crashing if it found VFDB hits, #303 errors in the `.tbl` format, #316 errors with types and where custom HMM dbs had identical scored hits, #317 types and #320 deprecated GC function)
* Adds `--mash_distance` and `--minced_args` as parameters (#299 thanks @iferres).

## Pharokka v 1.5.0 Update (20 September 2023)

* Adds support for `pyrodigal-gv` implementing `prodigal-gv` as a gene predictor for alternate genetic codes ([pyrodigal-gv](https://github.com/althonos/pyrodigal-gv) and [prodigal-gv](https://github.com/apcamargo/prodigal-gv)). This can be specified with `-g prodigal-gv` and is recommended for metagenomic input datasets. Thanks to @[althonos](https://github.com/althonos) and @[apcamargo](https://github.com/apcamargo) for making this possible, and to @[asierFernandezP](https://github.com/asierFernandezP) for raising this as an issue in the first place [here](https://github.com/gbouras13/pharokka/issues/290).
Expand Down Expand Up @@ -280,9 +286,10 @@ For a full explanation of all arguments, please see [usage](docs/run.md).
pharokka defaults to 1 thread.

```
usage: pharokka.py [-h] [-i INFILE] [-o OUTDIR] [-d DATABASE] [-t THREADS] [-f] [-p PREFIX] [-l LOCUSTAG] [-g GENE_PREDICTOR] [-m] [-s] [-c CODING_TABLE] [-e EVALUE] [--fast] [--mmseqs2_only]
[--meta_hmm] [--dnaapler] [--custom_hmm CUSTOM_HMM] [--genbank] [--terminase] [--terminase_strand TERMINASE_STRAND] [--terminase_start TERMINASE_START]
[--skip_extra_annotations] [--skip_mash] [-V] [--citation]
usage: pharokka.py [-h] [-i INFILE] [-o OUTDIR] [-d DATABASE] [-t THREADS] [-f] [-p PREFIX] [-l LOCUSTAG] [-g GENE_PREDICTOR] [-m] [-s]
[-c CODING_TABLE] [-e EVALUE] [--fast] [--mmseqs2_only] [--meta_hmm] [--dnaapler] [--custom_hmm CUSTOM_HMM] [--genbank]
[--terminase] [--terminase_strand TERMINASE_STRAND] [--terminase_start TERMINASE_START] [--skip_extra_annotations]
[--skip_mash] [--minced_args MINCED_ARGS] [--mash_distance MASH_DISTANCE] [-V] [--citation]
pharokka: fast phage annotation program
Expand Down Expand Up @@ -331,6 +338,10 @@ options:
--skip_extra_annotations
Skips tRNAscan-se, MINced and Aragorn.
--skip_mash Skips running mash to find the closest match for each contig in INPHARED.
--minced_args MINCED_ARGS
extra commands to pass to MINced (please omit the leading hyphen for the first argument). You will need to use quotation marks e.g. --minced_args "minNR 2 -minRL 21"
--mash_distance MASH_DISTANCE
mash distance for the search against INPHARED. Defaults to 0.2.
-V, --version Print pharokka Version
--citation Print pharokka Citation
```
Expand Down
46 changes: 25 additions & 21 deletions bin/create_custom_hmm.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,28 +118,32 @@ def main():

# loop over each PHROG
for file in file_list:
# check if MSA
if is_fasta_msa(f"{MSA_dir}/{file}"):
# read in each msa
with pyhmmer.easel.MSAFile(
f"{MSA_dir}/{file}", digital=True, alphabet=alphabet
) as msa_file:
msa = msa_file.read()
# split the file into root and suffix
root, _ = os.path.splitext(file)
name = root
# convert to bytes
msa.name = name.encode("utf-8")
# build the HMM
builder = pyhmmer.plan7.Builder(alphabet)
background = pyhmmer.plan7.Background(alphabet)
hmm, _, _ = builder.build_msa(msa, background)
with open(f"{HMM_dir}/{name}.hmm", "wb") as output_file:
hmm.write(output_file)
# check if hidden - skip
if file.startswith("."):
continue
else:
logger.warning(
f"{MSA_dir}/{file} does not seem to be a FASTA formatted MSA. Skipping."
)
# check if MSA
if is_fasta_msa(f"{MSA_dir}/{file}"):
# read in each msa
with pyhmmer.easel.MSAFile(
f"{MSA_dir}/{file}", digital=True, alphabet=alphabet
) as msa_file:
msa = msa_file.read()
# split the file into root and suffix
root, _ = os.path.splitext(file)
name = root
# convert to bytes
msa.name = name.encode("utf-8")
# build the HMM
builder = pyhmmer.plan7.Builder(alphabet)
background = pyhmmer.plan7.Background(alphabet)
hmm, _, _ = builder.build_msa(msa, background)
with open(f"{HMM_dir}/{name}.hmm", "wb") as output_file:
hmm.write(output_file)
else:
logger.warning(
f"{MSA_dir}/{file} does not seem to be a FASTA formatted MSA. Skipping."
)

# to concatenate all hmms

Expand Down
2 changes: 1 addition & 1 deletion bin/custom_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def run_custom_pyhmmer(custom_hmm, out_dir, threads, gene_predictor, evalue):
best_results[result.protein] = result
keep_protein.add(result.protein)
elif result.bitscore == previous_bitscore:
if best_results[result.protein].custom_hmm_id != hit.custom_hmm_id:
if best_results[result.protein].custom_hmm_id != hit.name.decode():
keep_protein.remove(result.protein)
else:
best_results[result.protein] = result
Expand Down
12 changes: 12 additions & 0 deletions bin/input_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,18 @@ def get_input():
help="Skips running mash to find the closest match for each contig in INPHARED.",
action="store_true",
)
parser.add_argument(
"--minced_args",
help='extra commands to pass to MINced (please omit the leading hyphen for the first argument). You will need to use quotation marks e.g. --minced_args "minNR 2 -minRL 21"',
default="",
type=str,
)
parser.add_argument(
"--mash_distance",
help="mash distance for the search against INPHARED. Defaults to 0.2.",
default=0.2,
type=float,
)
parser.add_argument(
"-V",
"--version",
Expand Down
55 changes: 16 additions & 39 deletions bin/pharokka.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,42 +9,21 @@
from custom_db import run_custom_pyhmmer
from databases import check_db_installation
from hmm import run_pyhmmer
from input_commands import (
check_dependencies,
get_input,
instantiate_dirs,
instantiate_split_output,
validate_and_extract_genbank,
validate_custom_hmm,
validate_fasta,
validate_gene_predictor,
validate_meta,
validate_terminase,
validate_threads,
)
from input_commands import (check_dependencies, get_input, instantiate_dirs,
instantiate_split_output,
validate_and_extract_genbank, validate_custom_hmm,
validate_fasta, validate_gene_predictor,
validate_meta, validate_terminase,
validate_threads)
from loguru import logger
from post_processing import Pharok, remove_post_processing_files
from processes import (
concat_phanotate_meta,
concat_trnascan_meta,
convert_gff_to_gbk,
reorient_terminase,
run_aragorn,
run_dnaapler,
run_mash_dist,
run_mash_sketch,
run_minced,
run_mmseqs,
run_phanotate,
run_phanotate_fasta_meta,
run_phanotate_txt_meta,
run_pyrodigal,
run_pyrodigal_gv,
run_trna_scan,
run_trnascan_meta,
split_input_fasta,
translate_fastas,
)
from processes import (concat_phanotate_meta, concat_trnascan_meta,
convert_gff_to_gbk, reorient_terminase, run_aragorn,
run_dnaapler, run_mash_dist, run_mash_sketch,
run_minced, run_mmseqs, run_phanotate,
run_phanotate_fasta_meta, run_phanotate_txt_meta,
run_pyrodigal, run_pyrodigal_gv, run_trna_scan,
run_trnascan_meta, split_input_fasta, translate_fastas)
from util import count_contigs, get_version


Expand Down Expand Up @@ -181,9 +160,7 @@ def main():
)

if dnaapler_success == True:
input_fasta = os.path.join(
out_dir, "dnaapler/dnaapler_reoriented.fasta"
)
input_fasta = os.path.join(out_dir, "dnaapler/dnaapler_reoriented.fasta")
destination_file = os.path.join(
out_dir, f"{prefix}_dnaapler_reoriented.fasta"
)
Expand Down Expand Up @@ -321,7 +298,7 @@ def main():
logger.info("Starting tRNA-scanSE.")
run_trna_scan(input_fasta, args.threads, out_dir, logdir)
# run minced and aragorn
run_minced(input_fasta, out_dir, prefix, logdir)
run_minced(input_fasta, out_dir, prefix, args.minced_args, logdir)
run_aragorn(input_fasta, out_dir, prefix, logdir)

# running mmseqs2 on the 3 databases
Expand Down Expand Up @@ -460,7 +437,7 @@ def main():
logger.info("Finding the closest match for each contig in INPHARED using mash.")
# in process.py
run_mash_sketch(input_fasta, out_dir, logdir)
run_mash_dist(out_dir, db_dir, logdir)
run_mash_dist(out_dir, db_dir, args.mash_distance, logdir)
# part of the class
pharok.inphared_top_hits()
else:
Expand Down
16 changes: 4 additions & 12 deletions bin/pharokka_proteins.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,12 @@
from pathlib import Path

from databases import check_db_installation
from input_commands import (
check_dependencies,
instantiate_dirs,
validate_fasta,
validate_threads,
)
from input_commands import (check_dependencies, instantiate_dirs,
validate_fasta, validate_threads)
from loguru import logger
from post_processing import remove_directory, remove_file
from proteins import (
Pharok_Prot,
get_input_proteins,
run_mmseqs_proteins,
run_pyhmmer_proteins,
)
from proteins import (Pharok_Prot, get_input_proteins, run_mmseqs_proteins,
run_pyhmmer_proteins)
from util import get_version


Expand Down
2 changes: 1 addition & 1 deletion bin/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -586,4 +586,4 @@ def create_plot(
fig.savefig(outfile, dpi=dpi)

# Save the image as an SVG
fig.savefig(svg_plot_file, format='svg', dpi=dpi)
fig.savefig(svg_plot_file, format="svg", dpi=dpi)
Loading

0 comments on commit 2db12bf

Please sign in to comment.