Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paired-end bowtie2-align 2.5.2 crash when R2 fastq reads have been trimmed to zero length #451

Open
cnluzon opened this issue Nov 7, 2023 · 4 comments

Comments

@cnluzon
Copy link

cnluzon commented Nov 7, 2023

Hi! I have an issue running bowtie2 version 2.5.2 that I did not have with previous 2.5.1 version.

bowtie2 --version output:

/home/user/miniconda3/envs/bowtie2_test/bin/bowtie2-align-s version 2.5.2
64-bit
Built on fv-az632-196
Sat Oct 14 05:42:50 UTC 2023
Compiler: gcc version 12.3.0 (conda-forge gcc 12.3.0-2) 
Options: -O3 -msse2 -funroll-loops -g3 -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/user/miniconda3/envs/bowtie2_test/include -fdebug-prefix-map=/opt/conda/conda-bld/bowtie2_1697261192232/work=/usr/local/src/conda/bowtie2-2.5.2 -fdebug-prefix-map=/home/user/miniconda3/envs/bowtie2_test=/usr/local/src/conda-prefix -std=c++11 -DPOPCNT_CAPABILITY -DNO_SPINLOCK -DWITH_QUEUELOCK=1 -DWITH_ZSTD
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

bowtie2 call that returns the error:

bowtie2 --verbose -p 2 -x ref/ref1 -1 test_R1.fastq -2 test_R2.fastq --fast

The error:

Error, fewer reads in file specified with -2 than in file specified with -1
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): bowtie2-align exited with value 134

I narrowed down the issue to the smallest FASTQ that would give me the error and I realise that what is triggering the event is an empty sequence read on R2:

@NB501365:168:HWFNVBGX3:1:11103:6189:3099_ACCGGG
CGACGCTCGGAAGAGCGTCGTGTACCCTATAGT
+
AAAAAEEEEEAEEEEEEEEEEEEEEEEAEEEEE
@NB501365:168:HWFNVBGX3:1:11103:9300:3121_CCCGCA

+

@NB501365:168:HWFNVBGX3:1:11103:5564:3152_AGGCCC
GTCTGGACCTGGAAGAGATCGGAAGAGCGTCGT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEE

I understand this comes from the previous trimming step, but before (bowtie2 2.5.1), the corresponding read on R1 would be mapped. However now it is probably understood as an empty line and bowtie2 thinks that I have different number of reads. I cannot just remove those empty reads on the R2 file because then I would need to remove heir mate in R1 too, so I think this case should somehow be handled? I have not been able to find information on whether FASTQ files allow for such structure, but on the previous 2.5.1 version this would be correctly handled and the read would be outputted as:

NB501365:168:HWFNVBGX3:1:11103:9300:3121_CCCGCA 4       *       0       0       *       *       0       0       TGTGCAGTTCTAAGGAGATGGTGACTGAACAGACGT    EEEEEAEEEEEEEEEEEEEEEEE/AEEAA6EEEEEA    YT:Z:UU

in the alignment, with a correct YT:Z:UU for a read that was not paired.

The verbose output:

(INFO): After arg handling:
(INFO):   Binary args:
[ -p 2 -x ref/ref1 --fast --verbose -1 test_R1.fastq -2 test_R2.fastq ]
(INFO): Using the small index (ref/ref1.1.bt2).
(INFO): "/home/user/miniconda3/envs/bowtie2_test/bin/bowtie2-align-s" --wrapper basic-0 -p 2 -x "ref/ref1" --fast --verbose -1 "test_R1.fastq" -2 "test_R2.fastq"
Applying preset: 'fast' using preset menu 'V0'
Final policy string: 'SEED=0;SEEDLEN=22;DPS=10;ROUNDS=2;IVAL=S,0,2.50'
Entered driver(): 12:04:55
Creating PatternSource: 12:04:55
Opening hit output file: 12:04:55
About to initialize fw Ebwt: 12:04:55
  About to open input files: 12:04:55
Opening "ref/ref1.1.bt2"
Opening "ref/ref1.2.bt2"
  Finished opening input files: 12:04:55
  Reading header: 12:04:55
Headers:
    len: 700000
    bwtLen: 700001
    sz: 175000
    bwtSz: 175001
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 43751
    offsSz: 175004
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3646
    numLines: 3646
    ebwtTotLen: 233344
    ebwtTotSz: 233344
    color: 0
    reverse: 0
Reading plen (1): 12:04:55
Dispatching to search driver: 12:04:55
About to initialize rev Ebwt: 12:04:55
  About to open input files: 12:04:55
Opening "ref/ref1.rev.1.bt2"
Opening "ref/ref1.rev.2.bt2"
  Finished opening input files: 12:04:55
  Reading header: 12:04:55
Headers:
    len: 700000
    bwtLen: 700001
    sz: 175000
    bwtSz: 175001
    lineRate: 6
    offRate: 4
    offMask: 0xfffffff0
    ftabChars: 10
    eftabLen: 20
    eftabSz: 80
    ftabLen: 1048577
    ftabSz: 4194308
    offsLen: 43751
    offsSz: 175004
    lineSz: 64
    sideSz: 64
    sideBwtSz: 48
    sideBwtLen: 192
    numSides: 3646
    numLines: 3646
    ebwtTotLen: 233344
    ebwtTotSz: 233344
    color: 0
    reverse: 1
Reading plen (1): 12:04:55
Read 1 reference strings from 1 records: 12:04:55
  About to open input files: 12:04:55
Opening "ref/ref1.1.bt2"
Opening "ref/ref1.2.bt2"
  Finished opening input files: 12:04:55
  Reading header: Error, fewer reads in file specified with -2 than in file specified with -1
12:04:55
Reading plen (1): 12:04:55
Reading rstarts (3): 12:04:55
Reading ebwt (233344): 12:04:55
terminate called after throwing an instance of 'int'
Aborted (core dumped)
(ERR): bowtie2-align exited with value 134

The test files:

test_reads.zip

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Nov 8, 2023

Hello,

Thank you for your detailed reporting on this issue and providing sample reads that I could use to recreate it. I have pushed a change to bug_fixes branch that should resolve this issue. Let me know if your thoughts.

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Nov 9, 2023

Update -- I found an issue with the code that I am looking into. I apologise for the inconvenience.

@cnluzon
Copy link
Author

cnluzon commented Nov 9, 2023

No problem! right now I have it pinned to the previous version. Thanks for your rapid response!

@ch4rr0
Copy link
Collaborator

ch4rr0 commented Nov 9, 2023

This latest iteration should work. Here is the output I get when trying to align the test data against some generic index. Thank you for your patience.

4 reads; of these:
  3 (75.00%) were paired; of these:
    3 (100.00%) aligned concordantly 0 times
    0 (0.00%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    3 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    3 pairs aligned 0 times concordantly or discordantly; of these:
      6 mates make up the pairs; of these:
        6 (100.00%) aligned 0 times
        0 (0.00%) aligned exactly 1 time
        0 (0.00%) aligned >1 times
  1 (25.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants