Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some reference entries show up as duplicates #135

Open
DeadlineWasYesterday opened this issue Jul 13, 2022 · 1 comment
Open

Some reference entries show up as duplicates #135

DeadlineWasYesterday opened this issue Jul 13, 2022 · 1 comment

Comments

@DeadlineWasYesterday
Copy link

DeadlineWasYesterday commented Jul 13, 2022

Hi Ben,

First off, really appreciate your work and the available teaching material.

I have been getting this strange problem when trying to work with SAM files for downstream steps. Samtools view errors out saying there are duplicated entries in the SAM header.

Here are the diagnostics I ran:

  1. I double checked my reference to make sure all the headers were unique.
  2. I built the index and ran the alignment three times.
  3. I checked to see if I had reference entries with different headers, but the same sequence. I did find a few, but these were not the headers showing up as duplicates.
  4. In the SAM file, it appears that all my reads are also being mapped twice to these duplicated headers.

image

image

image

These are the commands I ran to build the index and do the alignment:

bowtie-build --threads 40 -o 2 m7b.fa m7b
 ~/bin/bowtie -v 0 -y --norc -a -p 250 -f u36.fa -x m7b -S u36ob.sam

Surely, this is not normal?

@DeadlineWasYesterday DeadlineWasYesterday changed the title Reference entries having separate headers show up as duplicates Some reference entries show up as duplicates Jul 13, 2022
@DeadlineWasYesterday
Copy link
Author

Hmm. I think I have traced the source of the problem. I ran multiple (too many) instances of bowtie using different combinations of flags to see if it was any of the flags that was causing the problem. Then I was wondering whether the memory buffer on the HPC was generating the same results in spite of me changing the files (The m7b.fa file did previously have duplicate headers). So I switched to a different computer node and ran bowtie again, but the problem was still there.

Then I tried changing the computer node and building the index again, and that apparently solved the problem.

TL;DR: I believe it was the memory buffer. The program is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant