Predicate to filter conversion #234

arahuja · 2014-04-29T23:18:33Z

This PR is for issue #62

ADAMPredicate derives from UnboundRecordFilter and can be used to set ParquetInputFormat.setUnboundRecordFilter. It also has an apply method to filter an existing RDD. This will allow to use predicates on parquet files for predicate pushdown but also on an already loaded RDD (if we load from BAM/SAM file, or use the same filters after some processing (removing duplicates after mark_duplicates before proceeding to the other read-prep stages))

I added a few examples - HighQualityReadPredicate, UniqueMappedRead and GenotypeRecordPASSPredicate.

Also, in ADAMRecordConditions and ADAMGenotypeConditions are utility predicates which can be composed using AND and OR to create a new predicate. We can also specify non-equality predicates easily as well.

AmplabJenkins · 2014-04-29T23:25:40Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/310/

arahuja · 2014-04-29T23:40:44Z

adam-cli/src/main/scala/org/bdgenomics/adam/cli/PileupAggregator.scala

@@ -48,7 +48,7 @@ class PileupAggregator(protected val args: PileupAggregatorArgs)
 val companion = PileupAggregator

 def run(sc: SparkContext, job: Job) {
- val pileups: RDD[ADAMPileup] = sc.adamLoad(args.readInput, predicate = Some(classOf[LocusPredicate]))


Not sure why this was using LocusPredicate before?

That predicate is necessary. Pileups are created from mapped reads only.

Maybe we should put a comment to keep others from tripping up on this too?

Sure - but for this is after pileup creation right? Also the fields that LocusPredicate will check against are not defined for an ADAMPileup. I was going to substitute the MappedReadPredicate, but wasn't able to actually because of the typing as that is only applicable on ADAMRecord

You're right, Arun. This is after pileup creation so the predicate isn't needed.

AmplabJenkins · 2014-04-29T23:52:26Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/311/

massie · 2014-04-30T00:06:03Z

Please run scalariform, e.g. mvn org.scalariform:scalariform-maven-plugin:format

…ff RDDs on and after load

AmplabJenkins · 2014-04-30T02:27:13Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/312/

Predicate to filter conversion

massie · 2014-05-05T20:46:36Z

Thanks, Arun! I really liked seeing all the tests you put in this pull request. 👍

arahuja reviewed Apr 29, 2014
View reviewed changes

use adampredicate instead of unboundrecordfilter to allow filtering o…

8c8b100

…ff RDDs on and after load

massie added a commit that referenced this pull request May 5, 2014

Merge pull request #234 from hammerlab/predicate-bam

78bc6c1

Predicate to filter conversion

massie merged commit 78bc6c1 into bigdatagenomics:master May 5, 2014

This was referenced May 6, 2014

Change locuspredicate to unique read mapped predicate hammerlab/guacamole#47

Merged

Change locuspredicate to unique read mapped predicate bigdatagenomics/avocado#62

Merged

arahuja deleted the predicate-bam branch May 6, 2014 16:54

fnothaft mentioned this pull request Sep 20, 2014

Predicate to filter conversion. #62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicate to filter conversion #234

Predicate to filter conversion #234

arahuja commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

arahuja Apr 29, 2014

massie Apr 30, 2014

massie Apr 30, 2014

arahuja Apr 30, 2014

massie May 5, 2014

AmplabJenkins commented Apr 29, 2014

massie commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

massie commented May 5, 2014

Predicate to filter conversion #234

Predicate to filter conversion #234

Conversation

arahuja commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

arahuja Apr 29, 2014

Choose a reason for hiding this comment

massie Apr 30, 2014

Choose a reason for hiding this comment

massie Apr 30, 2014

Choose a reason for hiding this comment

arahuja Apr 30, 2014

Choose a reason for hiding this comment

massie May 5, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Apr 29, 2014

massie commented Apr 30, 2014

AmplabJenkins commented Apr 30, 2014

massie commented May 5, 2014