Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

Commit

Permalink
Merge pull request #574 from MaxUlysse/Abstract
Browse files Browse the repository at this point in the history
Update Abstracts
  • Loading branch information
Szilveszter Juhos authored Apr 25, 2018
2 parents 635e102 + 1401f96 commit f4a2581
Show file tree
Hide file tree
Showing 6 changed files with 129 additions and 16 deletions.
2 changes: 0 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ language: java

jdk: openjdk8

node_js: "node"

services:
- docker

Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
# Cancer Analysis Workflow Of Tumor/Normal Pairs At The National Genomics Infrastructure Of SciLifeLab
# The XVth KICancer Retreat 2016

## Cancer Analysis Workflow Of Tumor/Normal Pairs At The National Genomics Infrastructure Of SciLifeLab

Maxime Garcia
Pelin Akan,
Teresita Díaz de Ståhl,
Jesper Eisfeldt,
Szilveszter Juhos,
Malin Larsson,
Björn Nystedt,
Pall Olason,
Monica Nistér,
Max Käller

BarnTumörBanken, Department of Oncology Pathology, Karolinska Institutet, Science for Life Laboratory

(Pelin Akan, Teresita Diaz de Ståhl, Jesper Eisfeldt, Szilveszter Juhos, Malin Larsson, Björn Nystedt, Pall Olason, Monica Nistér, Max Käller)

One of the most prominent usage of NGS is whole genome sequencing (WGS). The
National Genomics Infrastructure (NGI) at Science for Life Laboratory is today
providing WGS and germ line variant analysis. However, building a robust and
Expand Down
29 changes: 20 additions & 9 deletions doc/Abstracts/ESHG_2017.md → doc/Abstracts/2017-05-ESHG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# CAW - Cancer Analysis Workflow to process normal/tumor WGS data
# European Human Genetics Conference 2017

Maxime Garcia 1, Szilveszter Juhos 2, Malin Larsson 3, Teresita Diaz de Ståhl 4, Jesper Eisfeldt 5, Sebastian DiLorenzo 6, Pall Olason 7, Björn Nystedt 7, Monica Nistér 4, Max Käller 8
## CAW - Cancer Analysis Workflow to process normal/tumor WGS data

As whole genome sequencing is getting cheaper, it is viable to compare NGS data from normal and tumor samples of numerous patients. There are still many challenges, mostly regarding bioinformatics: datasets are huge, workflows are complex, and there are multiple tools to choose from for somatic and structural variants and quality control.

We are presenting CAW (Cancer Analysis Workflow) a complete open source pipeline to resolve somatic variants from WGS data: it is written in Nextflow, a domain specific language for workflow building. We are utilizing GATK best practices to align, realign and recalibrate short-read data in parallel for both tumor and normal sample. After these preprocessing steps several somatic variant callers scan the resulting BAM files; MuTect1, MuTect2 and Strelka are used to find somatic SNVs and small indels.For structural variants we use Manta. Furthermore, we are applying ASCAT to estimate sample heterogeneity, ploidy and CNVs.

The software can start the analysis from raw FASTQ files, from the realignment step, or directly with any subset of variant callers. At the end of the analysis the resulting VCF files are merged to facilitate further downstream processing, though the individual results are also retained. The flow is capable of accommodating further variant calling software or CNV callers. It is also prepared to process normal - tumor - and several relapse samples.

Besides variant calls, the workflow provides quality controls presented by MultiQC. A docker image is also available, the open source software can be downloaded from https://github.com/SciLifeLab/CAW .
Maxime Garcia 1,
Szilveszter Juhos 2,
Malin Larsson 3,
Teresita Díaz de Ståhl 4,
Jesper Eisfeldt 5,
Sebastian DiLorenzo 6,
Pall Olason 7,
Björn Nystedt 7,
Monica Nistér 4,
Max Käller 8

1. BarnTumörBanken, Department of Oncology Pathology, Science for Life Laboratory, Karolinska Institutet
2. Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University
Expand All @@ -18,3 +21,11 @@ Besides variant calls, the workflow provides quality controls presented by Multi
6. Department of Medical Sciences, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University
7. Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University
8. Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, Royal Institute of Technology

As whole genome sequencing is getting cheaper, it is viable to compare NGS data from normal and tumor samples of numerous patients. There are still many challenges, mostly regarding bioinformatics: datasets are huge, workflows are complex, and there are multiple tools to choose from for somatic and structural variants and quality control.

We are presenting CAW (Cancer Analysis Workflow) a complete open source pipeline to resolve somatic variants from WGS data: it is written in Nextflow, a domain specific language for workflow building. We are utilizing GATK best practices to align, realign and recalibrate short-read data in parallel for both tumor and normal sample. After these preprocessing steps several somatic variant callers scan the resulting BAM files; MuTect1, MuTect2 and Strelka are used to find somatic SNVs and small indels.For structural variants we use Manta. Furthermore, we are applying ASCAT to estimate sample heterogeneity, ploidy and CNVs.

The software can start the analysis from raw FASTQ files, from the realignment step, or directly with any subset of variant callers. At the end of the analysis the resulting VCF files are merged to facilitate further downstream processing, though the individual results are also retained. The flow is capable of accommodating further variant calling software or CNV callers. It is also prepared to process normal - tumor - and several relapse samples.

Besides variant calls, the workflow provides quality controls presented by MultiQC. A docker image is also available, the open source software can be downloaded from https://github.com/SciLifeLab/CAW .
6 changes: 4 additions & 2 deletions doc/Abstracts/PMC_2018.md → doc/Abstracts/2018-05-PMC.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Sarek, a workflow for WGS analysis of germline and somatic mutations
# Keystone Symposia - Precision Medicine in Cancer

## Sarek, a workflow for WGS analysis of germline and somatic mutations

Maxime Garcia 123*,
Szilveszter Juhos 123*,
Malin Larsson 456,
Teresita Diaz de Ståh l13,
Teresita Díaz de Ståhl 13,
Johanna Sandgren 13,
Jesper Eisfeldt 73,
Sebastian DiLorenzo 85A,
Expand Down
50 changes: 50 additions & 0 deletions doc/Abstracts/2018-06-NPMI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# The Nordic Precision Medicine Initiative - Meeting No 5

## Sarek, a portable workflow for WGS analysis of germline and somatic mutations

Maxime Garcia 123*,
Szilveszter Juhos 123*,
Malin Larsson 456,
Teresita Díaz de Ståhl 13,
Johanna Sandgren 13,
Jesper Eisfeldt 73,
Sebastian DiLorenzo 85A,
Marcel Martin B5C,
Pall Olason 95A,
Phil Ewels B2C,
Björn Nystedt 95A*,
Monica Nistér 13,
Max Käller 2D,
*Corresponding Author

1. Barntumörbanken, Dept. of Oncology Pathology;
2. Science for Life Laboratory;
3. Karolinska Institutet;
4. Dept. of Physics, Chemistry and Biology;
5. National Bioinformatics Infrastructure Sweden, Science for Life Laboratory;
6. Linköping University;
7. Clinical Genetics, Dept. of Molecular Medicine and Surgery;
8. Dept. of Medical Sciences;
9. Dept. of Cell and Molecular Biology;
A. Uppsala University;
B. Dept. of Biochemistry and Biophysics;
C. Stockholm University;
D. School of Biotechnology, Division of Gene Technology, Royal Institute of Technology

We present Sarek, a portable Open Source pipeline to resolve germline and somatic variants from WGS data: it is written in Nextflow, a domain-specific language for workflow building.
It processes normal samples or normal/tumor pairs (with the option to include matched relapses).

Sarek is based on GATK best practices to prepare short-read data, which is done in parallel for a tumor/normal pair sample.
After these preprocessing steps several variant callers scan the resulting BAM files: Manta for structural variants; Strelka and GATK HaplotypeCaller for germline variants; Freebayes, MuTect1, MuTect2 and Strelka for somatic variants; ASCAT to estimate sample heterogeneity, ploidy and CNVs.
At the end of the analysis the resulting VCF files can be annotated by SNPEff and/or VEP to facilitate further downstream processing.
Our ongoing effort focuses in filtering and prioritizing the annotated variants.

Sarek is based on Docker and Singularity containers, enabling version tracking, reproducibility and handling sensitive data.
It is designed with flexible environments in mind, like running on a local fat node, a HTC cluster or in a cloud environment like AWS.
The workflow is capable of accommodating further variant callers.
Besides variant calls, the workflow provides quality controls presented by MultiQC.
Checkpoints allow the software to be started from FastQ, BAM or VCF.
Besides WGS data, it is capable to process inputs from WES or gene panels.
The pipeline currently use GRCh37 or GRCh38 as a reference genome, it is also possible to add custom genomes.
It has been successfully used to analyze more than two hundred WGS samples sent to National Genomics Infrastructure (Science for Life Laboratory) from different users.
The MIT licensed Open Source code can be downloaded from GitHub.
43 changes: 43 additions & 0 deletions doc/Abstracts/2018-07-JOBIM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Journées Ouvertes en Biologie, Informatique et Mathématiques 2018

## Sarek, a portable workflow for WGS analysis of germline and somatic mutations

Maxime Garcia 123,
Szilveszter Juhos 123,
Malin Larsson 456,
Teresita Díaz de Ståhl 13,
Johanna Sandgren 13,
Jesper Eisfeldt 73,
Sebastian DiLorenzo 85A,
Marcel Martin B5C,
Pall Olason 95A,
Phil Ewels B2C,
Björn Nystedt 95A,
Monica Nistér 13,
Max Käller 2D

Max Käller <max.kaller@scilifelab.se>

1. Barntumörbanken, Dept. of Oncology Pathology;
2. Science for Life Laboratory;
3. Karolinska Institutet;
4. Dept. of Physics, Chemistry and Biology;
5. National Bioinformatics Infrastructure Sweden, Science for Life Laboratory;
6. Linköping University;
7. Clinical Genetics, Dept. of Molecular Medicine and Surgery;
8. Dept. of Medical Sciences;
9. Dept. of Cell and Molecular Biology;
A. Uppsala University;
B. Dept. of Biochemistry and Biophysics;
C. Stockholm University;
D. School of Biotechnology, Division of Gene Technology, Royal Institute of Technology

We present Sarek, a portable Open Source pipeline to resolve germline and somatic variants from WGS data: it is written in Nextflow, a domain-specific language for workflow building. It processes normal samples or normal/tumor pairs (with the option to include matched relapses).

Sarek is based on GATK best practices to prepare short-read data, which is done in parallel for a tumor/normal pair sample. After these preprocessing steps several variant callers scan the resulting BAM files: Manta for structural variants; Strelka and GATK HaplotypeCaller for germline variants; Freebayes, MuTect2 and Strelka for somatic variants; ASCAT and Control-FREEC to estimate sample heterogeneity, ploidy and CNVs. At the end of the analysis the resulting VCF files can be annotated by SNPEff and/or VEP to facilitate further downstream processing. Our ongoing effort focuses in filtering and prioritizing the annotated variants.

Sarek is based on Docker and Singularity containers, enabling version tracking, reproducibility and handling sensitive data. It is designed with flexible environments in mind, like running on a local fat node, a HTC cluster or in a cloud environment like AWS. The workflow is modular and capable of accommodating further variant callers. Besides variant calls, the workflow provides quality controls presented by MultiQC. Checkpoints allow the software to be started from FastQ, BAM or VCF. Besides WGS data, it is capable to process inputs from WES or gene panels.

The pipeline currently uses GRCh37 or GRCh38 as a reference genome, it is also possible to add custom genomes. It has been successfully used to analyze more than two hundred WGS samples sent to National Genomics Infrastructure (Science for Life Laboratory) from different users. The MIT licensed Open Source code can be downloaded from GitHub.

The authors thank the Swedish Childhood Cancer Foundation for the funding of Barntumörbanken. We would like to acknowledge support from Science for Life Laboratory, the National Genomics Infrastructure, NGI, and UPPMAX for providing assistance in massive parallel sequencing and computational infrastructure.

0 comments on commit f4a2581

Please sign in to comment.