Skip to content

Commit

Permalink
Merge pull request #208 from MaxUlysse/dev
Browse files Browse the repository at this point in the history
Getting dev up to date
  • Loading branch information
maxulysse authored May 26, 2020
2 parents e047b7d + 6da920a commit 25dd584
Show file tree
Hide file tree
Showing 22 changed files with 349 additions and 267 deletions.
21 changes: 18 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,22 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [2.6dev] - Piellorieppe
## [dev]

### Added

### Changed

- [#208](https://github.com/nf-core/sarek/pull/208) - Merge changes from the release PR
- [#208](https://github.com/nf-core/sarek/pull/208) - Bump version to `3.0dev`

### Fixed

### Deprecated

### Removed

## [2.6] - Piellorieppe

Piellorieppe is one of the main massif in the Sarek National Park.

Expand Down Expand Up @@ -63,8 +78,8 @@ Piellorieppe is one of the main massif in the Sarek National Park.
- [#164](https://github.com/nf-core/sarek/pull/164) - Update `gatk4-spark` from `4.1.4.1` to `4.1.6.0`
- [#180](https://github.com/nf-core/sarek/pull/180), [#195](https://github.com/nf-core/sarek/pull/195) - Improve minimal setting
- [#183](https://github.com/nf-core/sarek/pull/183), [#204](https://github.com/nf-core/sarek/pull/204) - Update `input.md` documentation
- [#197](https://github.com/nf-core/sarek/pull/197) - Output directory `DuplicateMarked` is now replaced by`DuplicatesMarked`
- [#204](https://github.com/nf-core/sarek/pull/204) - Output directory `controlFREEC` is now replaced by`Control-FREEC`
- [#197](https://github.com/nf-core/sarek/pull/197) - Output directory `DuplicateMarked` is now replaced by `DuplicatesMarked`
- [#204](https://github.com/nf-core/sarek/pull/204) - Output directory `controlFREEC` is now replaced by `Control-FREEC`

### Fixed

Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-sarek-2.6dev/bin:$PATH
ENV PATH /opt/conda/envs/nf-core-sarek-3.0dev/bin:$PATH

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-sarek-2.6dev > nf-core-sarek-2.6dev.yml
RUN conda env export --name nf-core-sarek-3.0dev > nf-core-sarek-3.0dev.yml
11 changes: 8 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@

## Introduction

Sarek is a workflow designed to run analyses on whole genome or targeted sequencing data from regular samples or tumour / normal pairs and could include additional relapses.
Sarek is a workflow designed to detect variants on whole genome or targeted sequencing data.
Initially designed for Human, and Mouse, it can work on any species with a reference genome.
Sarek can also handle tumour / normal pairs and could include additional relapses.

It's built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
It comes with docker containers making installation trivial and results highly reproducible.
Expand Down Expand Up @@ -93,16 +95,19 @@ Helpful contributors:
* [Francesco L](https://github.com/nibscles)
* [Friederike Hanssen](https://github.com/FriederikeHanssen)
* [Gisela Gabernet](https://github.com/ggabernet)
* [Harshil Patel](https://github.com/drpatelh)
* [James A. Fellows Yates](https://github.com/jfy133)
* [Jesper Eisfeldt](https://github.com/J35P312)
* [Johannes Alneberg](https://github.com/alneberg)
* [Tobias Koch](https://github.com/KochTobi)
* [Lucia Conde](https://github.com/lconde-ucl)
* [Malin Larsson](https://github.com/malinlarsson)
* [Marcel Martin](https://github.com/marcelm)
* [Nilesh Tawari](https://github.com/nilesh-tawari)
* [Olga Botvinnik](https://github.com/olgabot)
* [Phil Ewels](https://github.com/ewels)
* [Sabrina Krakau](https://github.com/skrakau)
* [Sebastian-D](https://github.com/Sebastian-D)
* [Tobias Koch](https://github.com/KochTobi)
* [Winni Kretzschmar](https://github.com/winni2k)
* [arontommi](https://github.com/arontommi)
* [bjornnystedt](https://github.com/bjornnystedt)
Expand Down Expand Up @@ -130,7 +135,7 @@ For further information or help, don't hesitate to get in touch on [Slack](https
## Citation

If you use `nf-core/sarek` for your analysis, please cite the `Sarek` article as follows:
> Garcia M, Juhos S, Larsson M et al. **Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]** *F1000Research* 2020, 9:63 [doi: 10.12688/f1000research.16665.1](https://f1000research.com/articles/9-63/v1).
> Garcia M, Juhos S, Larsson M et al. **Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]** *F1000Research* 2020, 9:63 [doi: 10.12688/f1000research.16665.1](http://dx.doi.org/10.12688/f1000research.16665.1).
You can cite the sarek zenodo record for a specific version using the following [doi: 10.5281/zenodo.3476426](https://zenodo.org/badge/latestdoi/184289291)

Expand Down
14 changes: 9 additions & 5 deletions bin/scrape_software_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
'ASCAT': ['v_ascat.txt', r"Version: (\S+)"],
'bcftools': ['v_bcftools.txt', r"bcftools (\S+)"],
'BWA': ['v_bwa.txt', r"Version: (\S+)"],
'CNVkit': ['v_cnvkit.txt', r"(\S+)"],
'Control-FREEC': ['v_controlfreec.txt', r"Control-FREEC\s(\S+)"],
'FastQC': ['v_fastqc.txt', r"FastQC v(\S+)"],
'FreeBayes': ['v_freebayes.txt', r"version: v(\d\.\d\.\d+)"],
'GATK': ['v_gatk.txt', r"Version:(\S+)"],
Expand All @@ -17,31 +19,33 @@
'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"],
'Nextflow': ['v_nextflow.txt', r"(\S+)"],
'nf-core/sarek': ['v_pipeline.txt', r"(\S+)"],
'Qualimap': ['v_qualimap.txt', r"QualiMap v.(\S+)"],
'QualiMap': ['v_qualimap.txt', r"QualiMap v.(\S+)"],
'R': ['v_r.txt', r"R version (\S+)"],
'samtools': ['v_samtools.txt', r"samtools (\S+)"],
'SnpEff': ['v_snpeff.txt', r"version SnpEff (\S+)"],
'SnpEff': ['v_snpeff.txt', r"SnpEff\s(\S+)"],
'Strelka': ['v_strelka.txt', r"([0-9.]+)"],
'TIDDIT': ['v_tiddit.txt', r"TIDDIT-(\S+)"],
'Trim Galore': ['v_trim_galore.txt', r"version (\S+)"],
'vcftools': ['v_vcftools.txt', r"([0-9.]+)"],
'VEP': ['v_vep.txt', r"ensembl-vep : (\S+)"],
'VEP': ['v_vep.txt', r"ensembl-vep : (\S+)"]
}
results = OrderedDict()
results['nf-core/sarek'] = '<span style="color:#999999;\">N/A</span>'
results['Nextflow'] = '<span style="color:#999999;\">N/A</span>'
results['AlleleCount'] = '<span style="color:#999999;\">N/A</span>'
results['ASCAT'] = '<span style="color:#999999;\">N/A</span>'
results['AlleleCount'] = '<span style="color:#999999;\">N/A</span>'
results['bcftools'] = '<span style="color:#999999;\">N/A</span>'
results['BWA'] = '<span style="color:#999999;\">N/A</span>'
results['CNVkit'] = '<span style="color:#999999;\">N/A</span>'
results['Control-FREEC'] = '<span style="color:#999999;\">N/A</span>'
results['FastQC'] = '<span style="color:#999999;\">N/A</span>'
results['FreeBayes'] = '<span style="color:#999999;\">N/A</span>'
results['GATK'] = '<span style="color:#999999;\">N/A</span>'
results['htslib'] = '<span style="color:#999999;\">N/A</span>'
results['Manta'] = '<span style="color:#999999;\">N/A</span>'
results['msisensor'] = '<span style="color:#999999;\">N/A</span>'
results['MultiQC'] = '<span style="color:#999999;\">N/A</span>'
results['Qualimap'] = '<span style="color:#999999;\">N/A</span>'
results['QualiMap'] = '<span style="color:#999999;\">N/A</span>'
results['R'] = '<span style="color:#999999;\">N/A</span>'
results['samtools'] = '<span style="color:#999999;\">N/A</span>'
results['SnpEff'] = '<span style="color:#999999;\">N/A</span>'
Expand Down
2 changes: 1 addition & 1 deletion conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ process {
time = {check_resource(24.h * task.attempt)}
shell = ['/bin/bash', '-euo', 'pipefail']

errorStrategy = {task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
errorStrategy = {task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish'}
maxErrors = '-1'
maxRetries = 3

Expand Down
7 changes: 5 additions & 2 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,18 @@ params {
igenomes_ignore = true
genome = 'smallGRCh37'
genomes_base = "https://raw.githubusercontent.com/nf-core/test-datasets/sarek/reference"
snpeff_db = 'WBcel235.86'
species = 'caenorhabditis_elegans'
vep_cache_version = '99'
}

process {
withName:Snpeff {
container = 'nfcore/sareksnpeff:dev.GRCh37'
container = 'nfcore/sareksnpeff:dev.WBcel235'
maxForks = 1
}
withLabel:VEP {
container = 'nfcore/sarekvep:dev.GRCh37'
container = 'nfcore/sarekvep:dev.WBcel235'
maxForks = 1
}
}
2 changes: 0 additions & 2 deletions conf/test_annotation.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,5 @@
includeConfig 'test.config'

params {
igenomes_ignore = false
genome = 'WBcel235'
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/sarek/testdata/vcf/Strelka_1234N_variants.vcf.gz'
}
4 changes: 2 additions & 2 deletions containers/snpeff/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-sarek-snpeff-2.6dev/bin:$PATH
ENV PATH /opt/conda/envs/nf-core-sarek-snpeff-3.0dev/bin:$PATH

# Setup default ARG variables
ARG GENOME=GRCh38
Expand All @@ -19,4 +19,4 @@ ARG SNPEFF_CACHE_VERSION=86
RUN snpEff download -v ${GENOME}.${SNPEFF_CACHE_VERSION}

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-sarek-snpeff-2.6dev > nf-core-sarek-snpeff-2.6dev.yml
RUN conda env export --name nf-core-sarek-snpeff-3.0dev > nf-core-sarek-snpeff-3.0dev.yml
2 changes: 1 addition & 1 deletion containers/snpeff/environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# You can use this file to create a conda environment for this pipeline:
# conda env create -f environment.yml
name: nf-core-sarek-snpeff-2.6dev
name: nf-core-sarek-snpeff-3.0dev
channels:
- conda-forge
- bioconda
Expand Down
4 changes: 2 additions & 2 deletions containers/vep/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a

# Add conda installation dir to PATH (instead of doing 'conda activate')
ENV PATH /opt/conda/envs/nf-core-sarek-vep-2.6dev/bin:$PATH
ENV PATH /opt/conda/envs/nf-core-sarek-vep-3.0dev/bin:$PATH

# Setup default ARG variables
ARG GENOME=GRCh38
Expand All @@ -27,4 +27,4 @@ RUN vep_install \
--NO_BIOPERL --NO_HTSLIB --NO_TEST --NO_UPDATE

# Dump the details of the installed packages to a file for posterity
RUN conda env export --name nf-core-sarek-vep-2.6dev > nf-core-sarek-vep-2.6dev.yml
RUN conda env export --name nf-core-sarek-vep-3.0dev > nf-core-sarek-vep-3.0dev.yml
2 changes: 1 addition & 1 deletion containers/vep/environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# You can use this file to create a conda environment for this pipeline:
# conda env create -f environment.yml
name: nf-core-sarek-vep-2.6dev
name: nf-core-sarek-vep-3.0dev
channels:
- conda-forge
- bioconda
Expand Down
24 changes: 12 additions & 12 deletions docs/annotation.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@ The main Sarek container has also `snpEff` and `VEP` installed, but without the

## Download cache

A Nextflow helper script has been designed to help downloading `snpEff` and `VEP` cache.
A Nextflow helper script has been designed to help downloading `snpEff` and `VEP` caches.
Such files are meant to be shared between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.

```bash
nextflow run download_cache.nf --snpeff_cache </Path/To/snpEffCache> --snpeff_db <snpEff DB version> --genome <GENOME>
nextflow run download_cache.nf --vep_cache </Path/To/VEPcache> --species <species> --vep_cache_version <VEP cache version> --genome <GENOME>
nextflow run download_cache.nf --snpeff_cache </path/to/snpEff/cache> --snpeff_db <snpEff DB version> --genome <GENOME>
nextflow run download_cache.nf --vep_cache </path/to/VEP/cache> --species <species> --vep_cache_version <VEP cache version> --genome <GENOME>
```

## Using downloaded cache
Expand All @@ -46,8 +46,8 @@ The cache will only be used when `--annotation_cache` and cache directories are
Example:

```bash
nextflow run nf-core/sarek --tools snpEff --step annotate --sample file.vcf.gz --snpeff_cache </Path/To/snpEffCache> --annotation_cache
nextflow run nf-core/sarek --tools VEP --step annotate --sample file.vcf.gz --vep_cache </Path/To/vepCache> --annotation_cache
nextflow run nf-core/sarek --tools snpEff --step annotate --sample <file.vcf.gz> --snpeff_cache </path/to/snpEff/cache> --annotation_cache
nextflow run nf-core/sarek --tools VEP --step annotate --sample <file.vcf.gz> --vep_cache </path/to/VEP/cache> --annotation_cache
```

## Using VEP CADD plugin
Expand All @@ -61,11 +61,11 @@ To enable the use of the VEP CADD plugin:
Example:

```bash
nextflow run nf-core/sarek --step annotate --tools VEP --sample file.vcf.gz --cadd_cache \
--cadd_InDels </PathToCADD/InDels.tsv.gz> \
--cadd_InDels_tbi </PathToCADD/InDels.tsv.gz.tbi> \
--cadd_WG_SNVs </PathToCADD/whole_genome_SNVs.tsv.gz> \
--cadd_WG_SNVs_tbi </PathToCADD/whole_genome_SNVs.tsv.gz.tbi>
nextflow run nf-core/sarek --step annotate --tools VEP --sample <file.vcf.gz> --cadd_cache \
--cadd_indels </path/to/CADD/cache/InDels.tsv.gz> \
--cadd_indels_tbi </path/to/CADD/cache/InDels.tsv.gz.tbi> \
--cadd_wg_snvs </path/to/CADD/cache/whole_genome_SNVs.tsv.gz> \
--cadd_wg_snvs_tbi </path/to/CADD/cache/whole_genome_SNVs.tsv.gz.tbi>
```

### Downloading CADD files
Expand All @@ -74,7 +74,7 @@ An helper script has been designed to help downloading CADD files.
Such files are meant to be share between multiple users, so this script is mainly meant for people administrating servers, clusters and advanced users.

```bash
nextflow run download_cache.nf --cadd_cache </Path/To/CADDcache> --cadd_version <CADD version> --genome <GENOME>
nextflow run download_cache.nf --cadd_cache </path/to/CADD/cache> --cadd_version <CADD version> --genome <GENOME>
```

## Using VEP GeneSplicer plugin
Expand All @@ -86,5 +86,5 @@ To enable the use of the VEP GeneSplicer plugin:
Example:

```bash
nextflow run nf-core/sarek --step annotate --tools VEP --sample file.vcf.gz --genesplicer
nextflow run nf-core/sarek --step annotate --tools VEP --sample <file.vcf.gz> --genesplicer
```
25 changes: 12 additions & 13 deletions docs/ascat.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ASCAT is written in R and available here: [github.com/Crick-CancerGenomics/ascat

To run ASCAT on NGS data we need BAM files for the tumor and normal samples, as well as a loci file with SNP positions.
If ASCAT is run on SNP array data, the loci file contains the SNPs on the chip.
When runnig ASCAT on NGS data we can use the same loci file, for exampe the one corresponding to the AffymetrixGenome-Wide Human SNP Array 6.0, but we can also choose a loci file of our choice with i.e. SNPs detected in the 1000 Genomes project.
When runnig ASCAT on NGS data we can use the same loci file, for example the one corresponding to the AffymetrixGenome-Wide Human SNP Array 6.0, but we can also choose a loci file of our choice with i.e. SNPs detected in the 1000 Genomes project.

### BAF and LogR values

Expand Down Expand Up @@ -132,29 +132,28 @@ Names of the chromosomes in chrom.sizes file must be the same as in the genome r

This created GC correction files with the following column headers:

```text
Chr Position 25 50 100 200 500 1000 2000 5000 10000 20000 50000 100000 200000 500000 1M 2M 5M 10M
```
| | | | | | | | | | | | | | | | | | | | |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|Chr|Position|25|50|100|200|500|1000|2000|5000|10000|20000|50000|100000|200000|500000|1M|2M|5M|10M|

This file gave an error when running ASCAT, and the error message suggested that it had to do with the column headers.
The Readme.txt in <https://github.com/Crick-CancerGenomics/ascat/tree/master/gcProcessing> suggested that the column headers should be:

```text
Chr Position 25bp 50bp 100bp 200bp 500bp 1000bp 2000bp 5000bp 10000bp 20000bp 50000bp 100000bp 200000bp 500000bp 1M 2M 5M 10M
```
| | | | | | | | | | | | | | | | | | | | |
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|Chr|Position|25bp|50bp|100bp|200bp|500bp|1000bp|2000bp|5000bp|10000bp|20000bp|50000bp|100000bp|200000bp|500000bp|1M|2M|5M|10M|

The column headers headers of the generated GC correction files were therefore manually edited.

#### Format of GC correction file

The final files are tab-delimited with the following columns (and some example data):

```text
Chr Position 25bp 50bp 100bp 200bp 500bp 1000bp 2000bp 5000bp 10000bp 20000bp 50000bp 100000bp 200000bp 500000bp 1M 2M 5M 10M
snp1 1 14930 0.541667 0.58 0.61 0.585 0.614 0.62 0.6 0.5888 0.588 0.4277 0.395041 0.380702 0.383259 0.341592 0.339747 0.386343 0.500537 0.511514
snp2 1 15211 0.625 0.64 0.67 0.63 0.61 0.612 0.6135 0.591 0.5922 0.4358 0.39616 0.380411 0.383167 0.34163 0.339771 0.386417 0.500558 0.511511
snp3 1 15820 0.541667 0.56 0.62 0.655 0.65 0.612 0.5885 0.5936 0.5797 0.4511 0.397771 0.379945 0.382999 0.341791 0.339832 0.386554 0.500579 0.511504
```
|Chr|Position|25bp|50bp|100bp|200bp|500bp|1000bp|2000bp|5000bp|10000bp|20000bp|50000bp|100000bp|200000bp|500000bp|1M|2M|5M|10M|
|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
|snp1|1|14930|0.541667|0.58|0.61|0.585|0.614|0.62|0.6|0.5888|0.588|0.4277|0.395041|0.380702|0.383259|0.341592|0.339747|0.386343|0.500537|0.511514
|snp2|1|15211|0.625|0.64|0.67|0.63|0.61|0.612|0.6135|0.591|0.5922|0.4358|0.39616|0.380411|0.383167|0.34163|0.339771|0.386417|0.500558|0.511511
|snp3|1|15820|0.541667|0.56|0.62|0.655|0.65|0.612|0.5885|0.5936|0.5797|0.4511|0.397771|0.379945|0.382999|0.341791|0.339832|0.386554|0.500579|0.511504

### Output

Expand Down
Binary file modified docs/images/sarek_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 25dd584

Please sign in to comment.