Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: metaphlan4 bowtie2-build --threads flag bug #229

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

heloint
Copy link

@heloint heloint commented Aug 16, 2024

metaphlan4 bowtie2-build "--threads" flag bug

Software details

metaphlan4 version: MetaPhlAn version 4.1.1 (11 Mar 2024)
bowtie2-build version: bowtie2-build version 2.2.3 64-bit, compiled with gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)


Issue

Traceback output

bowtie2-build: unrecognized option '--threads'
Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit
Error: Encountered internal Bowtie 2 exception (#1)
Command: bowtie2-build --wrapper basic-0 -q /tmp/tmpm62vnjcr/v_mks.fa /tmp/tmpm62vnjcr/v_mks --threads 4
Traceback (most recent call last):
  File "/usr/local/bin/metaphlan", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/metaphlan/metaphlan.py", line 1529, in main
    VSC_report = vsc_bowtie2(viralTempFolder, pars['nproc'], file_format=pars['input_type'],
  File "/usr/local/lib/python3.10/site-packages/metaphlan/metaphlan.py", line 450, in vsc_bowtie2
    subp.check_call( [bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)] )
  File "/usr/local/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bowtie2-build', '/tmp/tmpm62vnjcr/v_mks.fa', '/tmp/tmpm62vnjcr/v_mks', '-q', '--threads', '4']' returned non-zero exit status 1.

In the "./metaphlan/metaphlan.py" file, on line 445, in the subprocess call, the script calls the "bowtie2-build" command with the flag "--threads" to enable multithreading/multiprocessing. The "--threads" doesn't exists, rather the correct switch for multi-cores seems to be "-p".

It's probably a case of poor documentation, as of the current "--help" flag gives the following output:

Bowtie 2 version 2.2.3 by Ben Langmead (langmea@cs.jhu.edu, www.cs.jhu.edu/~langmea)
Usage: bowtie2-build [options]* <reference_in> <bt2_index_base>
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            <reference_in>)
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax <int>            max bucket sz for blockwise suffix-array builder
    --bmaxdivn <int>        max bucket sz as divisor of ref len (default: 4)
    --dcv <int>             diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate <int>      SA is sampled every 2^<int> BWT chars (default: 5)
    -t/--ftabchars <int>    # of chars consumed in initial lookup (default: 10)
    --seed <int>            seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

Command to reproduce

metaphlan \
        --bowtie2db /gpfs/projects/bsc40/current/okhannous/Metaphlan4/db \
        --index mpa_vJun23_CHOCOPhlAnSGB_202307 /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/BAM/corachan.unmapped.fastq.gz \
        --input_type fastq \
        --bowtie2out /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan.bz2 -s /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachansam.bz2 \
        --profile_vsc -o /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan_profiled.txt \
        --nproc 4 \
        --vsc_out /gpfs/projects/bsc40/current/dmajer/metaline-testy-output/METAPHLAN4/corachan.vsc.txt

Submitted fix

Substitute line 445 in ./metaphlan/metaphlan.py FROM:

    subp.check_call( [bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)] )

TO:

    try:
        subp.check_call([bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)])
    except subp.CalledProcessError as e:
        print(e, file=sys.stderr)
        errored_cmd = " ".join([bt2build_call, markerfile, dbpath, '-q','--threads', str(nproc)])
        corrected_cmd = " ".join([bt2build_call, markerfile, dbpath, '-q','-p', str(nproc)])
        print(
            f"==> WARNING: '{errored_cmd}' command is incompatible with the "
            "current version of bowtie2-build. "
            f"Re-trying the process with '{corrected_cmd}'",
            file=sys.stderr
        )
        subp.check_call([bt2build_call, markerfile, dbpath, '-q','-p', str(nproc)])

If the "--threads" flag was used, then at some point it was working correctly with older versions of Bowtie2. This way there's backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant