Adding support for a Sequence Dictionary from BAM files #9

(1) Adding referenceLength and referenceUrl to the ADAMRecord schema. ADAMRecord now contains a (flattened) version of the sequence dictionary element corresponding to each read. An entire sequence dictionary will be able to be read out of the records associated with a single BAM or SAM file. (2) Added SequenceDictionary class Aggregates a consistent set of SequenceRecord objects into a single dictionary. SequenceRecords are derived from the reference{Id, Name, Length, Url} fields of the ADAMRecord. Most of the work here goes in to providing methods for combining and mapping between sequence dictionaries, which will be used for cross-BAM or cross-ADAM comparisons. (3) Extended Bam2Adam to create a SequenceDictionary .dict parquet file When bam2adam is run on a BAM, we extract the sequence dictionary from the BAM's header and insert its values into each ADAMRecord. Also updated the adamBamLoad method in ADAMContext to do something similar, even without the explicit parquet file. (4) Adding the ListDict command. ListDict reads a SequenceDictionary out of a set of ADAMRecords and prints out all the component SequenceRecord objects contained in it. (5) Adding the CompareAdam command CompareAdam is a command that takes two ADAM files as arguments. It assumes that the same read names are present in *both* input ADAM files -- for example, if the two files were produced by different alignment tools run on the same set of reads. It calculates the number of reads that are unique to each ADAM file, as well as the number of shared reads which have the same alignment position across the two BAMS. This position is computed irrespective of the ordering of the Sequences in the sequence dictionaries of the two ADAM files, and therefore is a useful end-to-end test of the SequenceDictionary code, as well as being the basis for more useful pairwise comparisons of ADAM files in the future. (6) Extending AdamContext with methods to automate the loading of SequenceDictionary objects out of any conformant Avro schema.

…quenceRecord in comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for a Sequence Dictionary from BAM files #9

Adding support for a Sequence Dictionary from BAM files #9

Commits on Dec 2, 2013

Commits on Dec 3, 2013