Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added predicate option to loadCoverage #1156

Closed

Conversation

akmorrow13
Copy link
Contributor

No description provided.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1474/
Test PASSed.

@fnothaft
Copy link
Member

fnothaft commented Sep 9, 2016

Ah, so this is a philosophical decision articulated many moons in the past by @laserson, but we don't have the predicates on the "automagical" functions (i.e., loadX, where X is not a specific file format). IIRC, the reasoning here is that the way the predicate is executed is hard to reason about without knowing the underlying file format, so thus it isn't good practice to "move" this underneath the user. We used to have another function that would apply a predicate to raw Avro, but we don't have this now, which also means that the records loaded from this function would be different, since the predicate would be applied to a Parquet file, but not to coverage loaded from (e.g.) BED.

What's the use case here? I know you had one in mind, so I'm wondering what's the best way to move this forward.

@Georgehe4
Copy link
Contributor

The use case here is to enable loading in custom coverage files in Mango, since users might only be interested in viewing coverage information.

@fnothaft
Copy link
Member

The use case here is to enable loading in custom coverage files in Mango, since users might only be interested in viewing coverage information.

Those are just stored as Parquet Feature files, right? If so, then I'd do loadParquetFeatures with the predicate. I'm guessing that the goal is to have a region based predicate? If so, perhaps we should refactor the signature of the changed method (to take a region to filter on), and then add a filterByOverlappingReferenceRegion in the non-Parquet fork of the code (https://github.com/bigdatagenomics/adam/pull/1156/files#diff-d36ea7d0742decd0b040a73a96af06e9R972). How does that sound?

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1482/
Test PASSed.

@akmorrow13
Copy link
Contributor Author

I added a loadParquetCoverage and loadCoverage, the first which accepts a predicate. This is how loadFeatures is implemented.

@fnothaft
Copy link
Member

Thanks for updating this @akmorrow13! This LGTM. I will leave open for a day for any other review comments and will merge if there are no objections.

@fnothaft
Copy link
Member

Thanks @akmorrow13! I've merged this as 5e2853e.

@fnothaft fnothaft closed this Sep 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants