Skip to content

Commit

Permalink
Merge pull request #245 from gp201/expanded_pathogens_docs
Browse files Browse the repository at this point in the history
Add documentation for Running Freyja on other pathogens
  • Loading branch information
joshuailevy authored Jul 30, 2024
2 parents 5edf2a0 + caed8b4 commit aee67d0
Show file tree
Hide file tree
Showing 4 changed files with 82 additions and 0 deletions.
Binary file added docs/data/test.sorted.bam
Binary file not shown.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,5 @@ To ensure reproducibility of results, we provide old (timestamped) barcodes and
src/wiki/lineage_barcode_extract
src/wiki/read_analysis_tutorial
src/wiki/terra_workflow
src/wiki/expanded_pathogens

1 change: 1 addition & 0 deletions docs/src/wiki/command_line_workflow.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.. _command-line-workflow:
Command Line Workflow
-------------------------------------------------------------------------------

Expand Down
80 changes: 80 additions & 0 deletions docs/src/wiki/expanded_pathogens.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Running Freyja on other pathogens
-------------------------------------------------------------------------------

This guide provides instructions for analyzing non-SARS-CoV-2 pathogens such as
influenza or MPox using Freyja. The process is similar to SARS-CoV-2 analysis,
but with some key differences.

Data Availability
^^^^^^^^^^^^^^^^^

Data for various pathogens can be found in the following repository:
`Freyja Barcodes <https://github.com/gp201/Freyja-barcodes>`_

Folders are organized by pathogen, with each subfolder named after the date the
barcode was generated, using the format ``YYYY-MM-DD``. Barcode files are named
``barcode.csv``, and reference genome files are named ``reference.fasta``.

.. note::
Influenza barcodes are available upon request.

Required Files
^^^^^^^^^^^^^^

To perform these analyses, you will need the following files for the MPox pathogen:

* `test.sorted.bam <https://github.com/andersen-lab/Freyja/blob/main/docs/data/test.sorted.bam>`_: Aligned, trimmed, and sorted BAM file
* `reference.fasta <https://github.com/gp201/Freyja-barcodes/blob/main/MPX/2024-07-24/reference.fasta>`_: Reference genome file
* `barcode.csv <https://github.com/gp201/Freyja-barcodes/blob/main/MPX/2024-07-24/barcode.csv>`_: Barcode file


Setting Up Output Directories
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Since you will likely be working with multiple wastewater samples, it is
advisable to create directories for storing output files:

.. code-block:: sh
mkdir variants_files depth_files demix_files
Analysis Steps
^^^^^^^^^^^^^^

The first step is to generate a variant file. Use the following command to
perform this step:

.. code-block:: sh
freyja variants test.sorted.bam --ref reference.fasta --variants variants_files/test.tsv --depths depth_files/test.depth
Please note that you will be passing the reference genome file provided in the
pathogen folder as the ``--ref`` argument. In cases where multiple reference
genomes are present in the reference fasta, you can specify the name of the
desired reference genome with ``--refname [name-of-reference]``.

Once the variant file is generated, proceed to the de-mixing step with the
following command:

.. code-block:: sh
freyja demix variants_files/test.tsv depth_files/test.depth --barcodes barcode.csv --output demix_files/test.output
Please note that you will be passing the barcode file provided in the pathogen
folder as the ``--barcodes`` argument.

Once you’ve run demix on a bunch of samples, you can aggregate all of
the output files using the command

.. code-block:: sh
freyja aggregate demix_files/ --output bunch_of_files.tsv
From there, it’s easy to view the output files in any standard TSV viewer
(Excel, Numbers, LibreOffice Calc, etc.). You should see something like this:

.. code-block::
summarized lineages abundances resid coverage
test.tsv [('Other', 0.999999999530878)] MPX-A.3 MPX-A.2.2 0.79798000 0.20202000 7.5952064496123075 99.94117915510955

0 comments on commit aee67d0

Please sign in to comment.