Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with 16 bit N CIGAR OP field with Long Reads #708

Closed
jonn-smith opened this issue Feb 22, 2023 · 3 comments
Closed

Issue with 16 bit N CIGAR OP field with Long Reads #708

jonn-smith opened this issue Feb 22, 2023 · 3 comments
Labels

Comments

@jonn-smith
Copy link

I am doing some alignments of long read data and I am getting CIGAR operators with length beyond 65535. Because this is the spec, all the tools I'm using are falling over when trying to convert and work with my data in BAM format (SAM files work fine because they're ascii files).

These are genuine data - it's not an artifact / mistake.

I know there has been talk about expanding this field in the past to at least 32 bits. Can we revisit this and update the spec to handle longer alignments?

@jkbonfield
Copy link
Contributor

jkbonfield commented Feb 22, 2023

See #227 which added an additional CG tag to work around this.

Internally in htslib we spot this tag and convert it back to the CIGAR field, so the API works as if it was always there.

Actually changing the BAM format is something we'd like to be able to do, but realistically it's just too legacy to consider it worth while, and it'd cause even more confusion with many tools failing to upgrade. (You wouldn't believe how popular samtools 0.1.18 still is! People just refuse to update their tools!)

Edit: also which tools are failing to work? This spec change was merged in 2017. 6 years ought to be enough, so if tools are still choking then it's justified to be filing bug reports against them.

@jkbonfield jkbonfield added the sam label Feb 22, 2023
@jonn-smith
Copy link
Author

Ah! I apologize - I was searching for previous PRs and Issues and I missed it. @cmnbroad just pointed me to that as well.

This is the first time I've experienced the 16 bit issue, so I missed the -L flag in minimap2 that was created as a solution. I can re-run my alignments with that flag.

I'm a proponent of updating the BAM format / SAM spec and pushing the change - most people won't be affected by this and since it's versioned I think people should expect that at some point there will be updates. :) That said, I have also seen groups that are VERY hesitant to update things, but they shouldn't hold everyone else back.

@jkbonfield
Copy link
Contributor

Closing as it's a question and has been answered.

The topic of whether to update BAM with a format-breaking change or a backwards compatible aux tag was a hot topic with valid views from both sides, but ultimately the decision was made to not do a major version number upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

2 participants