Skip to content

Commit

Permalink
Merge pull request #224 from andersen-lab/sphinx-docs
Browse files Browse the repository at this point in the history
Sphinx docs
  • Loading branch information
dylanpilz authored Mar 27, 2024
2 parents f6c1bda + c7388f6 commit 73de69a
Show file tree
Hide file tree
Showing 41 changed files with 1,370 additions and 491 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-package-conda.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,4 @@ jobs:
- name: lint
run: |
pip install -q flake8
make lint
make lint
39 changes: 39 additions & 0 deletions .github/workflows/update_docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: docs

on:
push:
branches:
- main
paths:
- 'docs/**'
- 'freyja/_cli.py'

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r docs/requirements.txt
- name: Build docs
run: |
cd docs
make html
- name: Deploy to GH Pages
uses: peaceiris/actions-gh-pages@v3
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: docs/_build/html
force_orphan: true
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,8 @@ instance/
# Scrapy stuff:
.scrapy

# Sphinx documentation
# Sphinx build
docs/_build/

# PyBuilder
.pybuilder/
target/
Expand Down
140 changes: 6 additions & 134 deletions README.md

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
32 changes: 32 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
import sys
import os
project = 'Freyja'
copyright = '2024, Andersen Lab'
author = 'Andersen Lab'
version = 'v1.5.0'

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = ['sphinx_click', 'sphinx_rtd_theme']
templates_path = ['_templates']
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = 'sphinx_rtd_theme'
html_logo = 'src/freyja-logo.png'
# html_static_path = ['_build/html/_static']


# -- Setup for click -------------------------------------------------------
sys.path.insert(0, os.path.abspath('..'))
36 changes: 36 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Freyja Documentation
==================================
Freyja is a tool to recover relative lineage abundances from mixed SARS-CoV-2 samples from a sequencing dataset (BAM aligned to the Hu-1 reference). The method uses lineage-determining mutational "barcodes" derived from the UShER global phylogenetic tree as a basis set to solve the constrained (unit sum, non-negative) de-mixing problem.

Freyja is intended as a post-processing step after primer trimming and variant calling in `iVar (Grubaugh and Gangavaparu et al., 2019) <https://github.com/andersen-lab/ivar>`_. From measurements of SNV freqency and sequencing depth at each position in the genome, Freyja returns an estimate of the true lineage abundances in the sample.

To ensure reproducibility of results, we provide old (timestamped) barcodes and metadata in the separate `Freyja-data <https://github.com/andersen-lab/Freyja-data>`_ repository. Barcode version can be checked using the ``freyja demix --version`` command.

.. toctree::
:maxdepth: 2
:caption: Usage:

src/installation
src/usage/demix
src/usage/variants
src/usage/update
src/usage/boot
src/usage/aggregate
src/usage/plot
src/usage/dash
src/usage/relgrowthrate
src/usage/extract
src/usage/filter
src/usage/covariants
src/usage/plot-covariants

.. toctree::
:maxdepth: 2
:caption: Wiki:

src/wiki/command_line_workflow
src/wiki/cryptic_variants
src/wiki/custom_plotting_tutorial
src/wiki/lineage_barcode_extract
src/wiki/read_analysis_tutorial
src/wiki/terra_workflow
18 changes: 18 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
sphinx-click @ git+https://github.com/dylanpilz/sphinx-click.git
sphinx_rtd_theme
sphinx
pandas
pyyaml
seaborn
matplotlib
pysam
biopython
cvxpy
numpy
click
tqdm
matplotlib
joblib
plotly
requests
scipy
Binary file added docs/src/freyja-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions docs/src/installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Installation
-------------------------------------------------------------------------------

Freyja is entirely written in Python 3, but requires preprocessing by tools like iVar and `samtools <https://github.com/samtools/samtools>`_ mpileup to generate the required input data. We recommend using python3.7, but Freyja has been tested on python versions up to 3.10.

Install via Conda::

conda install -c bioconda freyja


Local build from source::

git clone https://github.com/andersen-lab/Freyja.git
cd Freyja
pip install -e .

Docker::

docker pull staphb/freyja
docker run --rm -it staphb/freyja [command]

27 changes: 27 additions & 0 deletions docs/src/usage/aggregate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. click:: freyja._cli:aggregate
:prog: freyja aggregate
:nested: full
:commands: aggregate
------------

**Example Usage:**

For rapid visualization of results, we also offer two utility methods
for manipulating the “demixed” output files. The first is an aggregation
method

::

freyja aggregate [directory-of-output-files] --output [aggregated-filename.tsv]

By default, the minimum genome coverage is set at 60 percent. To adjust
this, the ``--mincov`` option can be used (e.g. ``--mincov 75``.We also
now allow the user to specify a file extension of their choosing, using
the ``--ext`` option (for example, for ``demix`` outputs called
``X.output``)

::

freyja aggregate [directory-of-output-files] --output [aggregated-filename.tsv] --ext output


27 changes: 27 additions & 0 deletions docs/src/usage/boot.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. click:: freyja._cli:boot
:prog: freyja boot
:nested: full
:commands: boot
------------

**Example Usage:**

We provide a fast bootstrapping method for freyja, which can be run
using the command

::

freyja boot [variants-file] [depth-file] --nt [number-of-cpus] --nb [number-of-bootstraps] --output_basename [base-name]

which results in two output files: ``base-name_lineages.csv`` and
``base-name_summarized.csv``, which contain the 0.025, 0.05, 0.25, 0.5
(median),0.75, 0.95, and 0.975 percentiles for each lineage and WHO
designated VOI/VOC, respectively, as obtained via the bootstrap. A
custom lineage hierarchy file can be provided using ``--lineageyml``
option. If the ``--rawboots`` option is used, it will return two
additional output files ``base-name_lineages_boot.csv`` and
``base-name_summarized_boot.csv``, which contain the bootstrap estimates
(rather than summary statistics). We also provide the ``--eps``,
``--barcodes``, and ``--meta`` options as in ``freyja demix``. We now
also provide a ``--boxplot`` option, which should be specified in the
form ``--boxplot pdf`` if you want the boxplot in pdf format.
30 changes: 30 additions & 0 deletions docs/src/usage/covariants.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. click:: freyja._cli:covariants
:prog: freyja covariants
:nested: full
:commands: covariants
------------

**Example Usage:**

In many cases, it can be useful to study covariant mutations
(i.e. mutations co-occurring on the same read pair). This outputs to a tsv file that includes the mutations present in each
set of covariants, their absolute counts (the number of read pairs with
the mutations), their coverage ranges (the minimum and maximum position
for read-pairs with the mutations), their “maximum” counts (the number
of read pairs that span the positions in the mutations), and their
frequencies (the absolute count divided by the maximum count). Should
the user wish to only consider read pairs that span the entire genomic
region defined by (min_site, max_site), they may include the
``--spans_region`` flag. By default, the covariant patterns are sorted
in descending order by count, however they can also be sorted in
descending order by frequency by setting the ``--sort_by`` option to
“freq”, or sorted sequentially by mutation site by setting the
``--sort_by`` option to “site”. The ``--ref-genome`` argument defaults
to ``freyja/data/NC_045512_Hu-1.fasta``. If you are using a different
build to perfrom alignment, it is important to pass that file in to
``--ref-genome`` instead. Optionally, a gff file
(e.g. ``freyja/data/NC_045512_Hu-1.gff``) may be included via the
``--gff-file`` option to output amino acid mutations alongside
nucleotide mutations. Inclusion thresholds for read-mapping quality and
the number of observed instances of a set of covariants can be set using
``--min_quality`` and ``--min_count`` respectively.
45 changes: 45 additions & 0 deletions docs/src/usage/dash.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. click:: freyja._cli:dash
:prog: freyja dash
:nested: full
:commands: dash
------------

**Example Usage:**

We are now providing functionality to rapidly prepare a dashboard web
page, directly from aggregated freyja output. This can be done with the
command

::

freyja dash [aggregated-filename-tsv] [sample-metadata.csv] [dashboard-title.txt] [introContent.txt] --output [outputname.html] --lineage.yml [path-to-lineage.yml-file]

where the metadata file should have this
`form <freyja/data/sweep_metadata.csv>`__. See example
`title <freyja/data/title.txt>`__ and
`intro-text <freyja/data/introContent.txt>`__ files as well. For samples
taken the same day, we average the freyja outputs by default. However,
averaging can be performed that takes the viral loads into account using
the ``--scale_by_viral_load`` flag. The header and body color can be
changed with the ``--headerColor [mycolorname/hexcolor]`` and
``--bodyColor [mycolorname/hexcolor]`` option respectively. The
``--mincov`` option is also available, as in ``plot``. The resulting
dashboard will look like
`this <https://htmlpreview.github.io/?https://github.com/andersen-lab/Freyja/blob/main/freyja/data/test0.html>`__.

The plot can now be configured using the
``--config [path-to-plot-config-file]`` option. The `plot config
file <freyja/data/plot_config.yml>`__ is a yaml file. More information
about the plot config file can be found in the `sample config
file <freyja/data/plot_config.yml>`__. By default, this will use the
lineage hierarchy information present in ``freyja/dash/lineages.yml``,
but a custom hierarchy can be supplied using the
``--lineageyml [path-to-hierarchy-file]`` option. The
``--keep_plot_files`` option can be used keep the intermediate html for
the core plot (will be deleted following incorporation into the main
html output by default).

A CSV file will also be created along with the html dashboard which will
contain the relative growth rates for each lineage. The lineages will be
grouped together based on the ``Lineages`` key specified in the config
file if provided.
48 changes: 48 additions & 0 deletions docs/src/usage/demix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
.. click:: freyja._cli:demix
:prog: freyja demix
:nested: full
:commands: demix
------------

**Example Usage:**

After running ``freyja variants`` we can run:
``freyja demix [variants-file] [depth-file] --output [output-file]``

This outputs to a tsv file that includes the lineages present, their
corresponding abundances, and summarization by constellation. This
method also includes a ``--eps`` option, which enables the user to
define the minimum lineage abundance returned to the user
(e.g. ``--eps 0.0001``). A custom barcode file can be provided using the
``--barcodes [path-to-barcode-file]`` option. By default, freyja uses
the lineage hierarchy file located in\ ``freyja/data`` directory which
is updated everytime the ``freyja update`` command is run. The user,
however, can define a custom lineage hierarchy file
using\ ``--lineageyml [path-to-lineage-file]``. Users can get the
historic ``lineage.yml`` file at freyja-data GitHub repository
`here <https://github.com/andersen-lab/Freyja-data/tree/main/history_lineage_hierarchy>`_.
As the UShER tree now included proposed lineages, we now offer the
``--confirmedonly`` flag which removes unconfirmed lineages from the
analysis. For additional flexibility and reproducibility of analyses, a
custom lineage-to-constellation mapping metadata file can be provided
using the ``--meta`` option. A coverage depth minimum can be specified
using the ``--depthcutoff`` option, which excludes sites with coverage
less than the specified value. An example output should have the format

+-------------+------------------------------------------------------+
| | filename |
+=============+======================================================+
| summarized | [('Delta', 0.65), ('Other', 0.25), ('Alpha', 0.1)] |
+-------------+------------------------------------------------------+
| lineages | ['B.1.617.2' 'B.1.2' 'AY.6' 'Q.3'] |
+-------------+------------------------------------------------------+
| abundances | "[0.5 0.25 0.15 0.1]" |
+-------------+------------------------------------------------------+
| resid | 3.14159 |
+-------------+------------------------------------------------------+
| coverage | 95.8 |
+-------------+------------------------------------------------------+

Where ``summarized`` denotes a sum of all lineage abundances in a particular WHO designation (i.e. B.1.617.2 and AY.6 abundances are summed in the above example), otherwise they are grouped into "Other". The ``lineage`` array lists the identified lineages in descending order, and ``abundances`` contains the corresponding abundances estimates. Using the ``--depthcutoff`` option may result in some distinct lineages now having identical barcodes, which are grouped into the format ``[lineage]-like(num)`` (based on their shared phylogeny) in the output. A summary of this lineage grouping is outputted to ``[output-file]_collapsed_lineages.yml``. The value of ``resid`` corresponds to the residual of the weighted least absolute deviation problem used to estimate lineage abundances. The ``coverage`` value provides the 10x coverage estimate (percent of sites with 10 or greater reads- 10 is the default but can be modfied using the ``--covcut`` option in ``demix``). If there is an solver error during the `demix` step (generally associated with poor data quality), an error message will be returned, along with an output empty summarized, lineages, and abundances, and with resid = -1.

**NOTE**: The ``freyja variants`` output is stable in time, and does not need to be re-run to incorporate updated lineage designations/corresponding mutational barcodes, whereas the outputs of ``freyja demix`` will change as barcodes are updated (and thus ``demix`` should be re-run as new information is made available).
Loading

0 comments on commit 73de69a

Please sign in to comment.