-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose sorting by reference index #952
Comments
Can this be pushed off to 0.20.0? |
This does interest me givenmy general interest in sorting/index type stuff. I'm happy to take it unless someone else wants to. |
Thanks @jpdna! I've assigned it to you. |
As I interpret it, the goal of this issue is to add a CLI option flag The result would be a vcf or sam/bam file with records sorted first by the referenceIndex of the contig, and then by genomic position. The header sequence dictionary rows will also match the referenceIndex ordering. Is this correct? Question: I am not clear as to whether indeed we do wish for this command to also sort by genomic position within the reference/contig groups. It is possible the original sam/bam or vcf was not sorted by position, however we have no way to recover the original order, and there is not a guarantee the order ADAM would output would be the same as the original vcf or bam. Also - I need to look at code to understand more about the cases where As long a SequenceDictionary exists with contigs with referenceIndex, I guess the above a,b,c is upstream from concern in this issue, but just wanted to check. |
I have this implemented, just working on unit tests, will PR tomorrow. |
Resolves bigdatagenomics#952. Adds function `sortByReferenceIndexAndPosition` on RDDs of `AlignmentRecord`. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.
Resolves bigdatagenomics#952. Adds function `sortByReferenceIndexAndPosition` on RDDs of `AlignmentRecord`. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.
Resolves bigdatagenomics#952. Adds function `sortByReferenceIndexAndPosition` on RDDs of `AlignmentRecord`. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.
Resolves bigdatagenomics#952. Adds function `sortByReferenceIndexAndPosition` on RDDs of `AlignmentRecord`. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.
Resolves bigdatagenomics#952. Adds function `sortByReferenceIndexAndPosition` on RDDs of `AlignmentRecord`. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.
See #823; we don't currently expose sorting by index on the CLI, though.
The text was updated successfully, but these errors were encountered: