This tool computes the completeness of each KEGG pathway module for given set of KEGG orthologues (KOs) based on their presence/absence. The current version of this tool has 482 KEGG modules (updated 02/07/2024).
Please read the Theory section at the bottom of this README for a detailed explanation.
- per contig annotation with KOs (ideally given from hmmscan annotation (see instructions));
or - list of KOs.
*.summary.kegg_pathways.tsv
(example) contains module pathways completeness calculated for all KOs in the given input file.*.summary.kegg_contigs.tsv
(example) contains module pathways completeness calculated per each contig (first column contains name of contig) if contig annotation were provided with-i
.
Optional:
pathways_plots/
(example) folder containing PNG representation and graphs generated with--plot-pathways
argument.with_weights.*.tsv
example of output generated with--include-weights
argument. Each KO has a weight in brackets.
Check more examples of different output files here.
This tool was published in Pypi and Bioconda.
Docker container is available on DockerHub and Quay.
pip install kegg-pathways-completeness
Follow bioconda instructions
docker pull quay.io/biocontainers/kegg-pathways-completeness
conda create --name kegg-env
conda activate kegg-env
pip3 install -r requirements.txt
# for list of KOs
give_pathways -l {INPUT_LIST}
# test example:
# give_pathways -l 'tests/fixtures/give_pathways/test_kos.txt' -o test_list_kos
# per contig annotation with KOs
give_pathways -i {INPUT_FILE}
# test example:
# give_pathways.py -i 'tests/fixtures/give_pathways/test_pathway.txt' -o test_pathway
Required arguments:
input file:
An input file is required under either of the following commands:
- input table (
-i
/--input
): hmmsearch table (example) that was run on KEGG profiles DB with annotated sequences (preferable). If you don't have this table, follow these instructions to generate it. - file with KOs list (
-l
/--input-list
): comma separated file with list of KOs (example).
Optional arguments:
- output prefix (
-o
/--outname
): prefix for output tables (-o test_kos
in example) - add weight information to output files (
-w
/--include-weights
). The output table will contain the weight of each KO edge in the pathway graph, for example K00942(0.25) means that the KO has 0.25 importance in the given pathway. Example of output - plot present KOs in pathways (
p
/--plot-pathways
): generates a PNG containing a schematic representation of the pathway. Presented KOs are marked with red edges. Example: M00002
pathways data: modules information and graphs
This repository contains a set of pre-generated files. Modules information files can be found in pathways_data. The repository also contains pre-parsed module pathways into graphs format. In order to generate graphs all pathways were parsed with the NetworkX library. The graph for every module is shown in .png format in png folder and .dot format in dots folder. Pathway and weights of each KO can be easily checked with the .png image.
In order to run a tool there is no need to re-generate those files again. All graphs re-generation instructions and module pathways info re-generation commands are provided for updates and understanding a process.
modules information:
- list of KEGG modules in KOs notation (
-a
/--pathways
) (latest all_pathways.txt) - list of classes of KEGG modules (
-c
/--classes
) (latest all_pathways_class.txt) - list of names of KEGG modules (
-n
/--names
) (latest all_pathways_names.txt)
graphs:
- graphs constructed from each module (
-g
/--graphs
) (latest graphs.pkl)
NOTE: please make sure you have graphviz installed
You can also run the plotting script separately:
plot_completeness_graphs.py -i output_with_pathways_completeness
More examples for test data here
KEGG provides a representation of each pathway as a specific expression of KOs. example A ((B,C) D,E) (A+F) where:
- A, B, C, D, E, F are KOs
- space == AND
- comma == OR
- plus == essential component
- minus == optional component
- minus minus == missing optional component (replaced into K0000 with 0 weight (example))
Each expression was converted into a directed graph using NetworkX. The first node is node 0 and the last one is node 1. Each edge corresponds to a KO.
In order to compute pathways completeness, each node in the graph is weighted. The default weight of each edge is 0.
Given a set of predicted KOs, if the KO is present in the pathway, the corresponding edge will have assigned weight = 1 (or 0 if edge is optional or another value if edge is connected by +). After that, this script searches the most relevant path by graph_weight
from node 0 to node 1. max_graph_weight
is then calculated under the assumption that all KOs are present.
completeness = graph_weight/max_graph_weight * 100%