TensorSignatures

TensorSignatures is a tensor factorisation framework for mutational signature analysis, which in contrast to other methods, deciphers mutational processes not only in terms of mutational spectra, but also assess their properties with respect to various genomic variables, allows the inclusion of different mutation types and integrates a robust noise model toperform the inference.

TensorSignatures is a young project and breaking changes are to be expected. We keep a changelog and it will have possible breakage clearly documented.

Quick install

TensorSignatures makes use of the TensorFlow 1.5.x framework requiring the user to install a separate package to enable GPU support, i.e. tensorflow-gpu instead of tensorflow. We highly recommend to install TensorSignatures into an environment with tensorflow-gpu, as the tensor computations greatly benefit from GPU-acceleration.

Via GitHub

To obtain the most recent version of TensorSignatures, we recommend to download the repository directly from GitHub and to install the package into a virtual environment. To get started, clone the repository by executing the following commands in your terminal

$ git clone https://github.com/gerstung-lab/tensorsignatures.git && cd tensorsignatures

Then, create a new virtual environment and install all dependencies. If you have access to a GPU with cuda support use requirements-gpu.txt instead of requirements.txt.

$ python -m venv env
$ source env/bin/activate
$ pip install --upgrade pip setuptools wheel && pip install -r requirements.txt

Finally, install TensorSignatures.

$ python setup.py install

Via Pypi

To install tensorsignatures via Pypi simply type

$ pip install tensorsignatures

into your shell.

Via docker (& jupyter)

To run TensorSignatures within a docker environment, clone the repository

$ git clone https://github.com/gerstung-lab/tensorsignatures.git
$ cd tensorsignatures

and spin up the container using docker-compose

$ docker-compose up --build

This spins up a jupyter server including notebooks with tutorials on http://localhost:8889.

Free software: MIT license
Documentation: https://tensorsignatures.readthedocs.io.

Getting started

Step 1: Data preparation

Running TensorSignatures involves three steps: preparing the input data, i.e. creating the mutation count tensor as well as the mutation count matrix, computing a trinucleotide normalisation to account for differences in the nucleotide composition of different genomic regions, and running TensorSignatures.

Preparing input data using docker

We provide a docker image that contains all R and bioconductor dependencies to create the variant tensor and the other mutation type matrix. To use it, pull the image from docker. Note that the image is approximately 5 GB large.

$ docker pull sagar87/tensorsignatures-data:latest

To use the image switch into the folder containing your VCF data. Then run image using the following command and supply the VCF files as well as the name of the hdf5 output file (must be the last argument) as arguments.

$ docker run -v $PWD:/usr/src/app/mount sagar87/tensorsignatures-data <vcf1.vcf> <vcf2.vcf> ... <vcfn.vcf> <output.h5>

Then continue with Step 2.

Preparing input data using a custom installation

Make sure you have R3.4.x (!) and the packages VariantAnnotation and rhdf5 installed. You can install them, if necessary, by executing

$ Rscript -e "source('https://bioconductor.org/biocLite.R'); biocLite('VariantAnnotation')"

and

$ Rscript -e "source('https://bioconductor.org/biocLite.R'); biocLite('rhdf5')"

from your command line.

To get started, download the following files and place them in the same directory:

Constants.RData (contains GRanges objects that annotate transcription/replication orientation, nucleosomal and epigenetic states)

mutations.R (all required functions to partiton SNVs, MNVs and indels)

processVcf.R (loads vcf files and creates the SNV count tensor, MNV and indel count matrix; eventually needs custom modification to make the script run on your vcfs.)

genome.zip .

To obtain the SNV count tensor and the matrices containing other mutation types, execute processVcf.R and pass the VCF files you want to convert, as well as a name for an output hdf5 file as command line arguments, e.g.

$ Rscript processVcf.R <vcf1.vcf> <vcf2.vcf> ... <vcfn.vcf> <output.h5>

In case of errors please check wether you have correctly specified paths in line 6-8. Also, take a look at the readVcfSave function and adjust it when it fails.

Step 2: Computing trinucleotide normalisation

TensorSignatures requires a trinucleotide normalisation constant to account for differences in the nucleotide composition of genomic states. To compute it, invoke the prep sub routine of TensorSignatures and pass the hd5 file from Step 1 as well as the path for the output file as positional arguments to the programme.

$ tensorsignatures prep <output.h5> <tsdata.h5>

Step 3: Run TensorSignatures

There are two ways to run TensorSignatures using either the refit option, which fits the exposures of a set of pre-defined signatures extracted from the PCAWG cohort to a your dataset, or via the train subroutine, that performs a denovo extraction of tensor signatures. Refitting tensor signatures is computationally fast but does not allow to discover new signatures, while extracting new signatures from scratch is computationally intensive (GPU required) and requires ideally larger numbers of samples. For most use cases, with a small number of samples, we advice to use the refit option:

$ tensorsignatures --verbose refit tsData.h5 refit.pkl -n

To run a denovo extraction use

$ tensorsignatures --verbose train tsData.h5 denovo.pkl <rank> -k <size> -n -ep <epochs>

where rank specifies the decomposition rank, size controls the dispersion of the model, and epochs the number of desired epochs to fit the model. TensorSignatures outputs value of the objective function (log likelihood) that is minimised during training as well as the change of the objective during an epoch interval (delta). When deciding on the number of epochs to train the model ensure that it is sufficiently large such that the objective function converges, i.e. the delta value is close to, or fluctuates around zero. For more information on how to run TensorSignatures in a practical setting see the documentation. Running TensorSignatures will yield a pickle dump which can subsequently inspected using the tensorsignatures package.

Features

Run tensorsignatures on your dataset using the TensorSignature class provided by the package or via the command line tool.
Compute percentile based bootstrap confidence intervals for inferred parameters.
Basic plotting tools to visualize tensor signatures and inferred parameters

Credits

Harald Vöhringer and Moritz Gerstung

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
.github		.github
docs		docs
mount		mount
tensorsignatures		tensorsignatures
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
Dockerfile		Dockerfile
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
docker-compose.yml		docker-compose.yml
requirements-docker.txt		requirements-docker.txt
requirements-gpu.txt		requirements-gpu.txt
requirements-rtd.txt		requirements-rtd.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorSignatures

Quick install

Via GitHub

Via Pypi

Via docker (& jupyter)

Getting started

Step 1: Data preparation

Preparing input data using docker

Preparing input data using a custom installation

Step 2: Computing trinucleotide normalisation

Step 3: Run TensorSignatures

Features

Credits

About

Releases 1

Packages

Contributors 2

Languages

License

sagar87/tensorsignatures

Folders and files

Latest commit

History

Repository files navigation

TensorSignatures

Quick install

Via GitHub

Via Pypi

Via docker (& jupyter)

Getting started

Step 1: Data preparation

Preparing input data using docker

Preparing input data using a custom installation

Step 2: Computing trinucleotide normalisation

Step 3: Run TensorSignatures

Features

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages