Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot add .gz; skipping #97

Open
u2058152 opened this issue Oct 14, 2021 · 1 comment
Open

Cannot add .gz; skipping #97

u2058152 opened this issue Oct 14, 2021 · 1 comment

Comments

@u2058152
Copy link

u2058152 commented Oct 14, 2021

I am working with data from Wheat, where the chromosomes are very large, and am using ASCIIGenome to look at peak output from MACS3. I want to add two files, the narrowPeak and summits files. When I attempt this using the following command:
ASCIIGenome -fa Wheat.v21.fa Control.bam Wheat.v21.bam narrowPeak narrowPeak.gz
I get the following message:
Warning: 37457. Skipping:

Cannot add narrowPeak.gz; skipping

I think this error is to do with the size of the files. I compressed the files using tabix as in the ASCIIGenome instructions but still get the same error
Screenshot 2021-10-14 at 15 43 17

@dariober
Copy link
Owner

Hi- thanks for reporting the issue. I'm pretty sure this is due to tabix index failing with chromosomes larger than 512MB.

One should work with CSI indexes instead for both bam and interval files (I guess this is what you have for your bam files?). Unfortunately though, I'm not sure htsjdk supports CSI index for bed files but at least it does for BAM files (and I think IGV supported csi for interval files only recently, if that makes me feel better...).

As a hack, you could convert your narrowPeaks to bam and load those instead. You could do this using, e.g. bedtools:

bedToBam -i test.narrowPeak -g genome.fasta.fai | samtools sort > test.narrowPeak.bam
samtools index -c test.narrowPeak.bam

If you want to add additional information as sam tags you could use (check it's ok!):

# Prepare header
bedToBam -i test.narrowPeak -g genome.fasta.fai | samtools view -H > test.narrowPeak.hdr

# Output reads/peaks
bedToBam -i test.narrowPeak -g genome.fasta.fai | samtools view > test.narrowPeak.txt

# Prepare tags (ep: End position; sc:score; fc: fold-change, pv: pValue; qv: qValue; sm: peak summit)
awk -v OFS='\t' '{print "ep:i:"$3, "sc:f:"$5, "fc:f:"$7, "pv:f:"$8, "qv:f:"$9, "sm:i:"$10}' test.narrowPeak > test.narrowPeak.tags

# Combine header, reads, and tags. Then index
cat test.narrowPeak.hdr <(paste test.narrowPeak.txt test.narrowPeak.tags) | samtools sort > test.narrowPeak.bam
samtools index -c test.narrowPeak.bam

You would have to do the same for the annotation file. If you go through this route I would recommend using the latest versions of samtools and bedtools.

Hope this helps - I'm sorry it's only a temporary solution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants