Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

NeillGibson · 2016-12-12T14:53:27Z

Hi,

Are you planning to support read alignment and variant calling in Adam? For example with BWA and Freebayes?

As far as I know most development work in Adam is focused on:

porting genomics data formats(FASTQ,BAM,VCF,BED) to HDFS+MapReduce friendly formats
developing BAM post processing tools like MarkDuplicates, RealignIndels and BQSR.

And that the focus is not on not developing new software for read alignment or variant calling.
I did see that work was done on adding pipes for stream FASTQ, BAM and VCF to legacy tools.
#1112

Are you planning to support / test / develop read alignment + variant calling pipelines on Spark + Adam that make use of external read aligners / variant callers + your own data formats + bam post processing tools?

For Spark + Adam to be a real alternative to a normal HPC cluster for genomics data analysis read alignment + variant calling support is essential.

Thank you.

waltermblair · 2017-03-17T15:22:37Z

Check out CS-BWAMEM, it needs some updating but is an implementation of bwa via spark/adam.

fnothaft · 2017-03-17T15:23:32Z

+1 @waltermblair. I've got a WIP update PR at ytchen0323/cloud-scale-bwamem#9

ghost · 2017-03-22T13:46:12Z

Will this be integrated with the ADAM project itself? Alignment with BWA is the critical missing link in ADAM.

heuermh · 2017-03-22T14:19:47Z

Will this be integrated with the ADAM project itself?

Long term perhaps the cloud-scale-bwamem repository may migrate under the bigdatagenomics organization, to facilitate support and tighter integration with ADAM release cycles. It is not likely the code with be migrated into the adam repository though, as most applications are developed as separate repositories.

Note there are a few other options to integrating BWA and ADAM:

BWA with ADAM on Apache Spark using workflow engine

BWA and ADAM can be run as part of the same pipeline, as is demonstrated here, with Toil as the workflow engine and Docker as the container technology:

https://github.com/BD2KGenomics/toil-scripts/blob/master/src/toil_scripts/adam_gatk_pipeline/align_and_call.py

Docker images for this pipeline are developed in the cgl-docker-lib repository and hosted on quay.io.

ADAM on Apache Spark with BWA using ADAM Pipe API

An alternative execution model is being developed in the cannoli repository, where the data are partitioned using Apache Spark and ADAM and then streamed over pipes to an external BWA process on each compute node.

This takes advantage of the ADAM Pipe API, which in turn builds on Apache Spark's RDD.pipe API.

Reimplement BWA algorithm on ADAM on Apache Spark

Another option would be to reimplement the BWA algorithm in Scala on ADAM on Apache Spark. We currently have no plans to do this. If someone is interested and willing however, ... :)

fnothaft · 2017-03-22T15:17:49Z

In addition to calling the native BWA code, CS-bwamem has a Scala implementation of several of the core BWA algos.

NeillGibson · 2017-03-22T16:26:09Z

Thank you @fnothaft and @heuermh for this information on how to run BWA and Adam together on a Spark cluster.

I look forward to trying one or more of these options later this year to run a read alignment(bwa) and variant calling pipeline(freebayes/gatk) on a Spark cluster. I see that GATK is supported downstream and that also a Freebayes wrapper is being developer in the canoli repository.

heuermh · 2017-03-22T16:38:55Z

Thank you @NeillGibson for asking good questions! Ping us when you're ready to give things a go, maybe the story will be clearer by then.

Meanwhile, if you might be interested, we host a weekly video call for our team and collaborators. Email my username at berkeley.edu for details.

fnothaft · 2017-05-12T18:53:04Z

Closing as the alignment steps are downstream in Cannoli (e.g., bwa) and variant calling is in Avocado.

rajputakhil · 2017-05-19T00:41:17Z

How can I implement Cannoli in ADAM, please help.

fnothaft added the discussion label May 12, 2017

fnothaft closed this as completed May 12, 2017

heuermh modified the milestone: 0.23.0 Jul 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

NeillGibson commented Dec 12, 2016

waltermblair commented Mar 17, 2017

fnothaft commented Mar 17, 2017

ghost commented Mar 22, 2017

heuermh commented Mar 22, 2017

fnothaft commented Mar 22, 2017

NeillGibson commented Mar 22, 2017

heuermh commented Mar 22, 2017

fnothaft commented May 12, 2017

rajputakhil commented May 19, 2017

Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

Comments

NeillGibson commented Dec 12, 2016

waltermblair commented Mar 17, 2017

fnothaft commented Mar 17, 2017

ghost commented Mar 22, 2017

heuermh commented Mar 22, 2017

BWA with ADAM on Apache Spark using workflow engine

ADAM on Apache Spark with BWA using ADAM Pipe API

Reimplement BWA algorithm on ADAM on Apache Spark

fnothaft commented Mar 22, 2017

NeillGibson commented Mar 22, 2017

heuermh commented Mar 22, 2017

fnothaft commented May 12, 2017

rajputakhil commented May 19, 2017