From 36b06873ee14bf3117d4fc69c962593039e21a47 Mon Sep 17 00:00:00 2001 From: MaxUlysse Date: Thu, 13 Feb 2020 10:41:20 +0100 Subject: [PATCH 1/4] add @szilvajuhos s abstract for ESHG2020 --- docs/abstracts/2020-06-ESHG.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 docs/abstracts/2020-06-ESHG.md diff --git a/docs/abstracts/2020-06-ESHG.md b/docs/abstracts/2020-06-ESHG.md new file mode 100644 index 0000000000..bc595c4695 --- /dev/null +++ b/docs/abstracts/2020-06-ESHG.md @@ -0,0 +1,31 @@ +# European Society of Human Genetics - European Human Genetics Conference - Berlin, Germany, 2020-06 + +## Reproduce easily: analysis of matching tumor-normal NS data with the Sarek workflow + +Szilveszter Juhos, +Maxime Garcia, +Teresita Díaz de Ståhl, +Markus Mayrhofer, +Johanna Sandgren, +Monica Nistér + +High throughput sequencing for cancer analysis is almost a routine research tool by now: precision medicine initiatives are relying a lot on NGS technology and it is expected to sift through hundreds or thousands of WGS samples to publish new results. +To achieve this, a validated and stable bioinformatics pipeline is needed that can be used at diverse and secure computing environments. +Sarek is an open-source, container based workflow written in nextflow, that includes all the steps to get from raw FASTQ processing to annotated VCFs. +Its latest development is a filtering and ranking module to pick the most relevant somatic and germline variants, helping researchers to move towards clinical use. + +The pipeline gives results for germline SNVs and SVs using HaplotypeCaller, Strelka, Manta and TIDDIT, somatic variants are reported by Strelka, Mutect2 and Manta. +CNVs, sample purity and ploidy is estimated by ASCAT and Control-FREEC. +Furthermore, the software provides a broad set of QC metrics visualized in MultiQC. +The pipeline is modular, other somatic tools and software can be added as new building blocks. +Besides human GRCh37 and GRCh38, other model organism references (i.e. mouse or dog) are available in the base setup. +Starting from raw FASTQs to get annotated VCFs it takes about three days for a 90X/90X sample on a single 48 cores node. +This processing can be speed-up when using Sentieon for alignment. +Although Sarek was developed primarily for whole-genome sequencing, it can take whole-exome or gene-panel samples as well. + +Sarek is a member of nf-core, a collection of peer-reviewed set of workflows based on Nextflow. +The supported environments are conda, docker and singularity. +It is HPC-agnostic: can be used either on a single compute node or on a HPC cluser, or on cloud computing such as AWS, with little difference between workflow setups. +Sarek has been used in production at the National Genomics Infrastructure in Sweden to process thousands of germline and hundreds of cancer samples for the Swedish Pediatric Tumor BioBank and other research groups. + +Sarek and its documentation is available at https://nf-co.re/sarek under MIT license. From ad71e63689e6988dfda650e90ca3b7f162329bd8 Mon Sep 17 00:00:00 2001 From: MaxUlysse Date: Thu, 13 Feb 2020 10:45:32 +0100 Subject: [PATCH 2/4] update CHANGELOG --- CHANGELOG.md | 1 + 1 file changed, 1 insertion(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 487fefdce3..f3a1c54276 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) a - [#76](https://github.com/nf-core/sarek/pull/76) - Add `GATK Spark` possibilities to Sarek - [#87](https://github.com/nf-core/sarek/pull/87) - Add `GATK BaseRecalibrator` plot to `MultiQC` report +- [#115](https://github.com/nf-core/sarek/pull/115) - Add [@szilvajuhos](https://github.com/szilvajuhos) abstract for ESHG2020 ### `Changed` From 3e89339f2703bf1016c98a89242ee838d156ef18 Mon Sep 17 00:00:00 2001 From: MaxUlysse Date: Thu, 13 Feb 2020 10:55:53 +0100 Subject: [PATCH 3/4] Include Johanna's suggestions --- docs/abstracts/2020-06-ESHG.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/abstracts/2020-06-ESHG.md b/docs/abstracts/2020-06-ESHG.md index bc595c4695..380bd6f701 100644 --- a/docs/abstracts/2020-06-ESHG.md +++ b/docs/abstracts/2020-06-ESHG.md @@ -1,6 +1,6 @@ # European Society of Human Genetics - European Human Genetics Conference - Berlin, Germany, 2020-06 -## Reproduce easily: analysis of matching tumor-normal NS data with the Sarek workflow +## Reproduce easily: analysis of matching tumor-normal NGS data with the Sarek workflow Szilveszter Juhos, Maxime Garcia, @@ -26,6 +26,6 @@ Although Sarek was developed primarily for whole-genome sequencing, it can take Sarek is a member of nf-core, a collection of peer-reviewed set of workflows based on Nextflow. The supported environments are conda, docker and singularity. It is HPC-agnostic: can be used either on a single compute node or on a HPC cluser, or on cloud computing such as AWS, with little difference between workflow setups. -Sarek has been used in production at the National Genomics Infrastructure in Sweden to process thousands of germline and hundreds of cancer samples for the Swedish Pediatric Tumor BioBank and other research groups. +Sarek has been used in production at the National Genomics Infrastructure in Sweden to process thousands of germline and hundreds of cancer samples for the Swedish Childhood Tumor Biobank and other research groups. Sarek and its documentation is available at https://nf-co.re/sarek under MIT license. From d76128a842738ce98f1f148dc26bf89acddbf5ee Mon Sep 17 00:00:00 2001 From: MaxUlysse Date: Mon, 17 Feb 2020 13:17:11 +0100 Subject: [PATCH 4/4] update abstract --- docs/abstracts/2020-06-ESHG.md | 48 ++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 20 deletions(-) diff --git a/docs/abstracts/2020-06-ESHG.md b/docs/abstracts/2020-06-ESHG.md index 380bd6f701..b754dfeb6b 100644 --- a/docs/abstracts/2020-06-ESHG.md +++ b/docs/abstracts/2020-06-ESHG.md @@ -9,23 +9,31 @@ Markus Mayrhofer, Johanna Sandgren, Monica Nistér -High throughput sequencing for cancer analysis is almost a routine research tool by now: precision medicine initiatives are relying a lot on NGS technology and it is expected to sift through hundreds or thousands of WGS samples to publish new results. -To achieve this, a validated and stable bioinformatics pipeline is needed that can be used at diverse and secure computing environments. -Sarek is an open-source, container based workflow written in nextflow, that includes all the steps to get from raw FASTQ processing to annotated VCFs. -Its latest development is a filtering and ranking module to pick the most relevant somatic and germline variants, helping researchers to move towards clinical use. - -The pipeline gives results for germline SNVs and SVs using HaplotypeCaller, Strelka, Manta and TIDDIT, somatic variants are reported by Strelka, Mutect2 and Manta. -CNVs, sample purity and ploidy is estimated by ASCAT and Control-FREEC. -Furthermore, the software provides a broad set of QC metrics visualized in MultiQC. -The pipeline is modular, other somatic tools and software can be added as new building blocks. -Besides human GRCh37 and GRCh38, other model organism references (i.e. mouse or dog) are available in the base setup. -Starting from raw FASTQs to get annotated VCFs it takes about three days for a 90X/90X sample on a single 48 cores node. -This processing can be speed-up when using Sentieon for alignment. -Although Sarek was developed primarily for whole-genome sequencing, it can take whole-exome or gene-panel samples as well. - -Sarek is a member of nf-core, a collection of peer-reviewed set of workflows based on Nextflow. -The supported environments are conda, docker and singularity. -It is HPC-agnostic: can be used either on a single compute node or on a HPC cluser, or on cloud computing such as AWS, with little difference between workflow setups. -Sarek has been used in production at the National Genomics Infrastructure in Sweden to process thousands of germline and hundreds of cancer samples for the Swedish Childhood Tumor Biobank and other research groups. - -Sarek and its documentation is available at https://nf-co.re/sarek under MIT license. +### Introduction + +High throughput sequencing for precision medicine is now a routine method. +Numerous tools have to be used, and analysis is time consuming. +We propose Sarek, an open-source container based bioinformatics workflow for germline or matching tumor-normal pairs, written in Nextflow, to process WGS, whole-exome or gene-panel samples. + +#### Materials and methods + +Sarek is part of nf-core, a collection of peer-reviewed workflows; supported environments are Conda, Docker and Singularity. +It is system-agnostic: can be used on single machines, clusters (HPC) or in a cloud such as AWS, with little difference between setups. +Additional software can be included as new modules. +Several model organism references are available (including Human GRCh37 and GRCh38). +The pipeline reports germline and somatic SNVs and SVs (by HaplotypeCaller, Strelka, Mutect2, Manta and TIDDIT). +CNVs, purity and ploidy is estimated by ASCAT and Control-FREEC. +Furthermore, a broad set of QC metrics is reported at the end of the workflow with MultiQC. + +#### Results + +From FASTQs to annotated VCFs it takes three days for a 90X/90X sample on a 48 cores node. +Sarek is used in production at the National Genomics Infrastructure Sweden for germline and cancer samples for the Swedish Childhood Tumor Biobank and other research groups. + +#### Conclusions + +Sarek is an easy-to-use tool for germline or cancer NGS samples, to be downloaded from [nf-co.re/sarek](https://nf-co.re/sarek) under MIT license. + +#### Supporting grants + +Swedish Research Council (2017-00630, 2017-00656), the Swedish Childhood Cancer Fund (BTB: BB2017-0001; BB2018-0001; BB2019-0001), and the Knut and Alice Wallenberg Foundation (KAW 2014.0278).