Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change locuspredicate to unique read mapped predicate #47

Merged
merged 1 commit into from
May 6, 2014

Conversation

arahuja
Copy link
Contributor

@arahuja arahuja commented May 6, 2014

This was changed in: bigdatagenomics/adam#234

@timodonnell
Copy link
Member

Thanks, @arahuja . Is it possible to make this respect the mapped and nonDuplicate arguments to the function (i.e. only filter out unmapped reads if mapped=True, and only filter out duplicate reads of nonDuplicate=True)? If it's not trivial, we can merge this as is so we fix the build and address this in another commit.

@@ -89,7 +89,7 @@ object Common extends Logging {
* @return
*/
def loadReads(args: Arguments.Reads, sc: SparkContext, mapped: Boolean = true, nonDuplicate: Boolean = true): RDD[ADAMRecord] = {
var reads: RDD[ADAMRecord] = sc.adamLoad(args.reads, Some(classOf[LocusPredicate]))
var reads: RDD[ADAMRecord] = sc.adamLoad(args.reads, Some(classOf[UniqueMappedReadPredicate]))
progress("Loaded %d reads.".format(reads.count))
if (mapped) reads = reads.filter(read => read.readMapped && read.contig.contigName != null && read.contig.contigLength > 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be able to get rid of lines 94 and 95, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those should be redundant with the filter - or we just remove the filter for now if you want the functions arguments to be respected.

@arahuja
Copy link
Contributor Author

arahuja commented May 6, 2014

@timodonnell Yea, we still can't have runtime parameters to the Parquet predicates - they need to be predefined classes. I was playing with a new ParquetInputFormat, but not sure if it'll work, hopefully coming soon

The way around this for now would be to have predefined classes for the 4 combinations and then load based on the argument parameters.

timodonnell added a commit that referenced this pull request May 6, 2014
Change locuspredicate to unique read mapped predicate
@timodonnell timodonnell merged commit 164e3b5 into master May 6, 2014
@timodonnell timodonnell deleted the change-predicate branch May 6, 2014 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants