Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash of the whole fasta headers instead of just the ids #161

Closed
danpal96 opened this issue Apr 14, 2023 · 1 comment · Fixed by #162
Closed

hash of the whole fasta headers instead of just the ids #161

danpal96 opened this issue Apr 14, 2023 · 1 comment · Fixed by #162

Comments

@danpal96
Copy link

vamb=4.0.1

When I use vamb with my contigs without renaming them I get the next error:

ValueError: At least one BAM file reference name hash to b8126ea6ff9bedc4ebb495695cd25129, expected c110292443b13cdc4d101e2114c17656. Make sure all BAM and FASTA headers are identical and in the same order.

This happens because in the parsecontigs.py Composition.from_file method the line contignames.append(entry.header) uses the whole header as the contig name (e.g. "contig1 description") istead of just the ID (e.g. "contig1"). I have resolved this by replacing this line with contignames.append(entry.header.split()[0])

jakobnissen added a commit that referenced this issue Apr 16, 2023
By using the identifier, we make sure the BAM identifiers match the FASTA
identifiers, because BAM files do not contain the whole header, only the
identifier.

Fix issue #161
@jakobnissen
Copy link
Member

Dear @danpal96

Thanks for the bug report. The fix is indeed to hash just the identifier when constructing the Composition.
Why not hash the entire header? Because SAM/BAM files only contain the identifier, not the header. Hence, if we want to check that the BAM files match the FASTA file, we cannot use the description.

Fixed by #162

jakobnissen added a commit that referenced this issue Apr 16, 2023
By using the identifier, we make sure the BAM identifiers match the FASTA
identifiers, because BAM files do not contain the whole header, only the
identifier.

Fix issue #161
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants