Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix CRAM embed_ref=2 with seqs overlapping ref end.
If the sequences align off the end of the reference and we are creating consensus on the fly, then the consensus generated also steps beyond the reference length. Although this longer reference is embedded, it is trimmed back by the CRAM decoder which validates against the declared reference length in SQ LN, leading to Ns appearing in the decoder. Therefore we now validate in the encoder too, which also needed refs_from_header updates to parse the LN tag so the encoder can trim. Note we already overloaded r->length==0 for an indication that we've not parsed the fa/fai file yet, so we can't just naively fill this out from the SQ LN header. We could hold this information elsewhere via a proper flag and modify all the places that utilise that knowledge, but the simplest (and safest) fix is to have a separate variable used for this one specific case. An example of failure could be seen in: ./test/test_view -C -o embed_ref=2 test/realn01.sam | \ ./test/test_view - | grep ST-E00128:308:HHVVLALXX:8:1217:16001:6565
- Loading branch information