Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alignmentSieve returns a truncated bam #1180

Open
sebastian-gregoricchio opened this issue Dec 23, 2022 · 18 comments
Open

alignmentSieve returns a truncated bam #1180

sebastian-gregoricchio opened this issue Dec 23, 2022 · 18 comments
Labels

Comments

@sebastian-gregoricchio
Copy link

sebastian-gregoricchio commented Dec 23, 2022

Python 3.9.12
deeptools 3.5.1
alignmentSieve 3.5.1

Dear all,
when I run alignmentSieve with ATACshift option, but not for all my samples, I get as output a truncated bam file.

For instance this is the flagstat of my original bam:

31138979 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
31138979 + 0 mapped (100.00% : N/A)
31138979 + 0 paired in sequencing
15587662 + 0 read1
15551317 + 0 read2
31138979 + 0 properly paired (100.00% : N/A)
31138979 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Then I run the following shifting:

alignmentSieve \
        --bam input.bam \
        --outFile shifted.bam \
        --minMappingQuality 20 \
        --minFragmentLength 0 \
        --maxFragmentLength 0 \
        --ATACshift \
        -p max

Then, a first thing is that the size of the file 2.4GB compared to the 5.4GB that I get without the shifting.
(I do not get any error message from alignmentSieve)

Secondly when I try to sort the file I get the following error:

user$   samtools sort -@ 30 -o shifted_sorted.bam shifted.bam

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Furthermore if I try to run the flagstat on the resulting file I get that the file is indeed truncated with less than half of the read:

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[bam_flagstat_core] Truncated file? Continue anyway.
1707936 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
1707936 + 0 mapped (100.00% : N/A)
1707936 + 0 paired in sequencing
854824 + 0 read1
853112 + 0 read2
1707936 + 0 properly paired (100.00% : N/A)
1707936 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Thank you in advance for your help!

@buzhizhang121
Copy link

Hello,
I have encountered exactly the same problem recently, have you solved it?
Does anyone know how to solve this problem?
Thank you for your help!

(Python 3.9.13,
deeptools 3.5.1,
alignmentSieve 3.5.1)

@sjm0240111
Copy link

I also met a similar problem.
When I finished running alignmentSieve, I tried to use samtools to sort the shifted bam file.
Then it says:
[e::bgzf_read_block] invalid bgzf header at offset 716786409
[e::bgzf_read] read block operation failed with error 6 after 0 of 4 bytes

Thank you for your help!

(deeptools 3.5.1
samtools 1.16.1
pysam 0.19.1)

@fgualdr
Copy link

fgualdr commented Feb 18, 2023

Same problem here.
Is there a way to solve this?

@baishengjun
Copy link

Same problem here.
Is there a way to solve this?

@WardDeb
Copy link
Member

WardDeb commented May 12, 2023

Hi,

I was able to reproduce, though I'm a bit puzzled about the actual cause.
The default chunksize for alignmentSieve has been increased, and this seemed to have fixed the problem.
Could you try to reproduce the issue with the develop branch and see if the problem persists ?

Kind regards,

wardDeb

@WardDeb WardDeb mentioned this issue Jun 14, 2023
@WardDeb
Copy link
Member

WardDeb commented Aug 29, 2023

This didn't occur anymore with chunksize increases (release 3.5.2).
I'll close this for now, but feel free to re-open if it pops back up.

@WardDeb WardDeb closed this as completed Aug 29, 2023
@Leo-ccc
Copy link

Leo-ccc commented Apr 12, 2024

Hi, I have the same problem even after updating the deeptools to 3.5.5

alignmentSieve --numberOfProcessors 40 --ATACshift --paired --bam POOL-2.marked.rmDup.rmMulti.bam -o POOL-2.tmp.bam
Size of 'POOL-2.marked.rmDup.rmMulti.bam' is 2.6G and 'tmp.bam' is 512M

After the 'samtools sort' command, I got the errors:
[E::bgzf_read_block] Invalid BGZF header at offset 380755101 [E::bgzf_read] Read block operation failed with error 6 after 0 of 4 bytes samtools sort: truncated file. Aborting

Do you have any suggestions?

@WardDeb WardDeb reopened this Apr 12, 2024
@WardDeb
Copy link
Member

WardDeb commented Apr 12, 2024

This is not an issue with your tmp directory ? 512MB is an oddly specific size, otherwise, would you mind sharing (somehow) the bam file that causes this behavior ?

@Leo-ccc
Copy link

Leo-ccc commented Apr 12, 2024

I found that 'tmp.bam' file had fewer reads.

The tail of 'tmp.bam' seems strange. Is that a problem?

samtools view POOL-2.tmp.bam | tail -n 20
[E::bgzf_read_block] Invalid BGZF header at offset 380755098
[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes
[main_samview] truncated file.
samtools view: error closing "POOL-2.tmp.bam": -1
E200017923L1C035R0140471589 163 chr12 106999536 40 57M = 106999532 -52 * *
E200017923L1C035R0140471589 83 chr12 106999532 40 56M = 106999536 -52 * *
E200017923L1C024R0224092311 99 chr12 106999546 40 38M = 106999542 -33 * *
E200017923L1C024R0224092311 147 chr12 106999542 40 37M = 106999546 -33 * *
E200017923L1C023R0131256471 163 chr12 106999554 39 55M = 106999550 -50 * *
E200017923L1C023R0131256471 83 chr12 106999550 39 54M = 106999554 -50 * *
E200017923L1C032R0413648952 99 chr12 106999604 44 62M = 106999600 -57 * *
E200017923L1C032R0413648952 147 chr12 106999600 44 61M = 106999604 -57 * *
E200017923L1C024R0032428308 99 chr12 106999661 44 70M = 106999657 -65 * *
E200017923L1C024R0032428308 147 chr12 106999657 44 69M = 106999661 -65 * *
E200017923L1C037R0260351190 163 chr12 106999720 44 35M = 106999716 -30 * *
E200017923L1C037R0260351190 83 chr12 106999716 44 34M = 106999720 -30 * *
E200017923L1C013R0123664916 163 chr12 106999827 44 146M = 106999856 174 * *
E200017923L1C013R0123664916 83 chr12 106999856 44 145M = 106999827 -174 * *
E200017923L1C003R0151229789 163 chr12 106999888 44 72M = 106999884 -67 * *
E200017923L1C003R0151229789 83 chr12 106999884 44 71M = 106999888 -67 * *
E200017923L1C012R0261817120 163 chr12 107000003 44 50M = 106999999 -45 * *
E200017923L1C039R0054857891 163 chr12 107000003 44 70M = 106999999 -108 * *
E200017923L1C012R0261817120 83 chr12 106999999 44 49M = 107000003 -45 * *
E200017923L1C039R0054857891 83 chr12 106999999 44 112M = 107000003 -108 * *

@ultimatex5
Copy link

ultimatex5 commented Apr 13, 2024

Hi,

same problem here.. I've tried both 3.5.2 and 3.5.5 versions of deeptools.
if I remove --ATACshift all is good otherwise I get truncated Bam (all Bams has size of 30.251Kb and are empty).

Just tried also 3.5.3 but same problem.

Tried also 3.5.1 same problem (empty Bam files) but this time the size is different 213.746Kb.

@WardDeb WardDeb added the bug label Apr 13, 2024
@Leo-ccc
Copy link

Leo-ccc commented Apr 14, 2024

Hi. Another information is that 5 of 6 my sequencing BAM files ran successfully. Only 1 file faild. I think it is not a general problem but a specific one. I'd like to provide my BAM to you if you need one to test.
Thank you!

@WardDeb
Copy link
Member

WardDeb commented Apr 15, 2024

Thanks for the additional info, it'd be great if you could share the problematic and a working bam file somehow. Do you have a way of making these available ?

@WardDeb
Copy link
Member

WardDeb commented Apr 15, 2024

Just per update, I've received the files, this is now work in progress.

@Zeyu618
Copy link

Zeyu618 commented Jun 16, 2024

I get the non-truncated bam when using the latest alignmentSieve(3.5.5) on my first try on the sample that I always got truncated err regardless of how many times I've re-ran the alignmentSieve of the old version.
Update your alignmentSieve to 3.5.5 now!

@WardDeb
Copy link
Member

WardDeb commented Jun 16, 2024

It seems the chance for a truncated bam in 3.5.5 is decreased, but not completely removed. I'm aiming to have a true fix for this in the upcoming weeks.

@li1311139481
Copy link

Hi. Another information is that 5 of 6 my sequencing BAM files ran successfully. Only 1 file faild. I think it is not a general problem but a specific one. I'd like to provide my BAM to you if you need one to test. Thank you!

same problem. and I also tried to reprocess the failed files individually, and the results still failed, which I think is probably not an accident, but a specific file format problem

@li1311139481
Copy link

I use conda install deeptools=3.5.5. It still doesn't solve the problem. But I solved the problem by creating a new environment using conda and then installing deeptools using pip. You can try installing with pip for now

@sunta3iouxos
Copy link

hi there, just wanted to add some information on the issue:
I split the file in chromosomes with:

while read p; do   samtools view -o "/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_"$p".bam" /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.filtered.bam  ` echo $p | sed  's/^/chr/'` ; done < chr.txt

then:
indexing those files:

 for bam in /scratch/Theo/AP06/bam/filtered_bam/*.test_*; do   samtools index -@ 16 $bam; done

ATAC shift

for bam in /scratch/Theo/AP06/bam/filtered_bam/*.test_*bam; do alignmentSieve --bam $bam --outFile ${bam%.bam}"_Stmp.bam" --ATACshift --numberOfProcessors 16; done

and here the sorting:

 for bam in /scratch/Theo/AP06/bam/filtered_bam/*_Stmp.*bam; do samtools sort -@ 16 -O Bam -o ${bam%_Stmp.bam}"_shift.bam" $bam; done
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
samtools sort: truncated file. Aborting
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[bam_sort_core] merging from 0 files and 16 in-memory blocks...
[E::bgzf_read] Read block operation failed with error 4 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Those are the bam affected, chr11,chrX. I do not know if this is a random event:

ls /scratch/Theo/AP06/bam/filtered_bam/*_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_10_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_2_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_12_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_3_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_13_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_4_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_14_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_5_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_15_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_6_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_16_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_7_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_17_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_8_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_18_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_9_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_19_shift.bam  /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_M_shift.bam
/scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_1_shift.bam   /scratch/Theo/AP06/bam/filtered_bam/A006200409_226916_S85.test_Y_shift.bam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests