Skip to content

Commit

Permalink
DOC: update paper
Browse files Browse the repository at this point in the history
  • Loading branch information
Vini2 committed Sep 16, 2024
1 parent a1f93b3 commit c4c3966
Showing 1 changed file with 19 additions and 20 deletions.
39 changes: 19 additions & 20 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,89 +69,88 @@ A user can start the analysis by running the `metacoag` subcommand to bin a meta

The following inputs are required to run the `metacoag` subcommand.

* Contigs
* Contigs file
* Assembly graph file(s)
* Coverage of contigs - can be obtained by running a coverage calculation tool such as CoverM [https://github.com/wwood/CoverM](https://github.com/wwood/CoverM) or Koverage [@Roach:2024]
* A delimited file containing the contig identifier and its average read coverage for each contig - can be obtained by running a read coverage calculation tool such as CoverM [https://github.com/wwood/CoverM](https://github.com/wwood/CoverM) or Koverage [@Roach:2024]

The assembly graph files can vary depending on the assembler used to generate the contigs. The metaSPAdes version requires the assembly graph file in `.gfa` format and the paths file in `.paths` format. The MEGAHIT version requires the assembly graph file in `.gfa` format. The metaFlye version requires the assembly graph file `assembly_graph.gfa` and the paths file `assembly_info.txt`.
The assembly graph files can vary depending on the assembler used to generate the contigs. The metaSPAdes version requires the assembly graph file in `.gfa` format and the paths file in `.paths` format. The MEGAHIT version requires the assembly graph file in `.gfa` format. The metaFlye version requires the assembly graph file `assembly_graph.gfa` and the paths file `assembly_info.txt` from the final assembly output.

### Outputs

The following outputs will be generated by the `metacoag` subcommand.

* A delimited text file containing the contig identifier and bin identifier for each binned contig
* `.fasta` files for each bin
* `.fasta` files of the identified bins


## `prepare`

### Tool/processing function

If a delimited text file is not available, the MetaCoAG binning result can be formatted using the `prepare` subcommand into a delimited text file that represents each contig and its bin identifier.
If a delimited text file is not available, the initial binning result can be formatted using the `prepare` subcommand into a delimited text file that represents each contig and its bin identifier. This function allows users to format binning results from any existing metagenomic binning tool.

### Inputs

The directory containing the initial binning is required to run the `prepare` subcommand.

### Outputs

The `prepare` subcommand will generate a delimited text file such as `.csv` or `.tsv` containing contig identifier and bin identifier for the binning result.

The `prepare` subcommand will generate a delimited text file such as `.csv` or `.tsv` containing the contig identifier and bin identifier for each contig in the binning result.

## `graphbin`

### Tool/processing function

This formatted binning result from MetaCoAG can be improved by providing to GraphBin [@Mallawaarachchi1:2020] using the subcommand `graphbin` (\autoref{fig1}).
The formatted initial binning result from the `prepare` subcommand can be improved by providing to GraphBin [@Mallawaarachchi1:2020] using the subcommand `graphbin` (\autoref{fig1}).

### Inputs

The following inputs are required to run the `graphbin` subcommand.

* Contigs file
* Assembly graph file(s) - can vary depending on the assembler used to generate the contigs (refer to inputs under 'metacoag')
* Assembly graph file(s) - can vary depending on the assembler used to generate the contigs (refer to inputs under `metacoag`)
* A delimited text file containing the initial binning result
* Coverage of contigs - can be obtained by running a coverage calculation tool such as CoverM [https://github.com/wwood/CoverM](https://github.com/wwood/CoverM) or Koverage [@Roach:2024]

### Outputs

The following outputs will be generated by the `graphbin` subcommand.

* A delimited text file containing the contig identifier and bin identifier for each binned contig
* `.fasta` files for each bin
* `.fasta` files of the refined bins

## `graphbin2`

This formatted binning result can be improved by providing to GraphBin2 [@Mallawaarachchi2:2020; @Mallawaarachchi:2021] using the subcommand `graphbin2` (\autoref{fig1}).
The formatted initial binning result from the `prepare` subcommand can be improved by providing to GraphBin2 [@Mallawaarachchi2:2020; @Mallawaarachchi:2021] using the subcommand `graphbin2` (\autoref{fig1}).

### Inputs

The following inputs are required to run the `graphbin2` subcommand.

* Contigs file
* Assembly graph file(s) - can vary depending on the assembler used to generate the contigs (refer to inputs under 'metacoag')
* Assembly graph file(s) - can vary depending on the assembler used to generate the contigs (refer to inputs under `metacoag`)
* A delimited text file containing the initial binning result
* A delimited file containing the contig identifier and its average read coverage for each contig

### Outputs

The following outputs will be generated by the `graphbin2` subcommand.

* A delimited text file containing the contig identifier and bin identifier for each binned contig
* `.fasta` files for each bin
* `.fasta` files of the refined bins


## `visualise`

### Tool/processing function

The initial MetaCoAG binning result and the refined binning result can be visualised on the assembly graph using the `visualise` subcommand (\autoref{fig1}). Users can generate images in different formats such as `png`, `eps`, `pdf` and `svg`, and customise the dimensions of the images.
The initial binning result and the refined binning result can be visualised on the assembly graph using the `visualise` subcommand (\autoref{fig1}). Users can generate images in different formats such as `png`, `eps`, `pdf` and `svg`, and customise the dimensions of the images.

### Inputs

The following inputs are required to run the `visualise` subcommand.

* Contigs file
* Assembly graph file(s) - can vary depending on the assembler used to generate the contigs (refer to inputs under 'metacoag')
* Assembly graph file(s) - can vary depending on the assembler used to generate the contigs (refer to inputs under `metacoag`)
* A delimited text file containing the initial binning result
* A delimited text file containing the refined binning result

Expand All @@ -164,13 +163,13 @@ The following outputs will be generated by the `visualise` subcommand.

An example is shown in \autoref{fig2} for the Sim-5G+metaSPAdes dataset [@Mallawaarachchi2:2020; @Mallawaarachchi:2021] containing five bacterial species.

![Visualisation of the assembly graph with the initial binning result from MetaCoAG (left) and final binning result from GraphBin (right) for the Sim-5G+metaSPAdes dataset. The five colours represent the five bins and the white nodes represent unbinned contigs.\label{fig2}](visualisation.svg){width=100%}
![Visualisation of the assembly graph with the initial binning result from MetaCoAG (left) and final binning result from GraphBin (right) for the Sim-5G+metaSPAdes dataset. The vertices represent contigs and edges represent connections in the assembly graph. The five colours represent the five bins and the white vertices represent unbinned contigs.\label{fig2}](visualisation.svg){width=100%}

## `evaluate`

### Tool/processing function

The produced binning results can be evaluated using the `evaluate` subcommand by providing the ground truth bins of contigs (\autoref{fig1}). This evaluation is possible only for simulated or mock metagenomes where the ground truth genomes of contigs are known. GraphBin-Tk uses the four common metrics 1) precision, 2) recall, 3) F1-score and 4) Adjusted Rand Index (ARI) that have been used in previous binning studies [@Alneberg:2014; @Meyer:2018; @Mallawaarachchi1:2020]. These metrics are calculates as follows. The binning result is denoted as a $K \times S$ matrix with $K$ number of bins and $S$ number of ground truth taxa. In this matrix, the element $a_{ks}$ denotes the number of contigs binned to the $k^{th}$ bin and belongs to the $s^{th}$ taxa. $U$ denotes the number of unbinned contigs and $N$ denotes the total number of contigs. Following are the equations used to calculate the evaluation metrics.
The produced binning results can be evaluated using the `evaluate` subcommand by providing the ground truth bins of contigs (\autoref{fig1}). This evaluation is possible only for simulated or mock metagenomes where the ground truth genomes of contigs are known. GraphBin-Tk uses the four common metrics 1) precision, 2) recall, 3) F1-score and 4) Adjusted Rand Index (ARI) that have been used in previous binning studies [@Alneberg:2014; @Meyer:2018; @Mallawaarachchi1:2020]. These metrics are calculated as follows. The binning result is denoted as a $K \times S$ matrix with $K$ number of bins and $S$ number of ground truth taxa. In this matrix, the element $a_{ks}$ denotes the number of contigs binned to the $k^{th}$ bin and belongs to the $s^{th}$ taxa. $U$ denotes the number of unbinned contigs and $N$ denotes the total number of contigs. Following are the equations used to calculate the evaluation metrics.

__Precision__ = $\frac{\sum_{k}max_s \{a_{ks}\}}{\sum_{k}\sum_{s}a_{ks}}$

Expand All @@ -184,8 +183,8 @@ __ARI__ = $\frac{\sum_{k,s}\binom{a_{ks}}{2}-t_3}{\frac{1}{2}(t_1+t_2)-t_3}$ $wh

The following inputs are required to run the `evaluate` subcommand.

* Delimited text file for the ground truth
* Delimited text file for the binning result
* A delimited text file containing the ground truth
* A delimited text file containing the binning result

### Outputs

Expand Down

0 comments on commit c4c3966

Please sign in to comment.