Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SortMeRNA v4.3.2: Stuck on small input files #290

Closed
HosseinAsghari opened this issue May 7, 2021 · 7 comments
Closed

SortMeRNA v4.3.2: Stuck on small input files #290

HosseinAsghari opened this issue May 7, 2021 · 7 comments

Comments

@HosseinAsghari
Copy link

Hi,

I'm using sortmerna v4.3.2 and on the following paired-end input it gets stuck:

sample_R1.fastq:

@read32651 0:N:  00
AGCCCTCCCCACACACCCCTTCCCAACCCTCCCC
+
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

sample_R2.fastq:

@read32651 1:N:  00
GGGGAGGGCAGGGACGGGGGGTGTTGGGAGGGCT
+
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The command I'm running:

sortmerna -ref silva-bac-23s-id98.fasta -ref silva-arc-16s-id95.fasta -ref silva-bac-16s-id90.fasta -ref silva-euk-18s-id95.fasta -ref silva-euk-28s-id98.fasta -ref silva-arc-23s-id98.fasta -reads sample_R1.fastq -reads sample_R2.fastq -workdir test_dir -idx-dir index_dir -L 24 -interval 5 -max_pos 500 -blast '1 cigar qcov' -aligned -otu_map -de_novo_otu -threads 4 -v

And here's the log:

[process:1372] === Options processing starts ... ===
Found value: sortmerna
Found flag: -ref
Found value: silva-bac-23s-id98.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-arc-16s-id95.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-bac-16s-id90.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-euk-18s-id95.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-euk-28s-id98.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-arc-23s-id98.fasta of previous flag: -ref
Found flag: -reads
Found value: sample_R1.fastq of previous flag: -reads
Found flag: -reads
Found value: sample_R2.fastq of previous flag: -reads
Found flag: -workdir
Found value: test_dir of previous flag: -workdir
Found flag: -idx-dir
Found value: index_dir of previous flag: -idx-dir
Found flag: -L
Found value: 24 of previous flag: -L
Found flag: -interval
Found value: 5 of previous flag: -interval
Found flag: -max_pos
Found value: 500 of previous flag: -max_pos
Found flag: -blast
Found value: 1 cigar qcov of previous flag: -blast
Found flag: -aligned
Previous flag: -aligned is Boolean. Setting to True
Found flag: -otu_map
Previous flag: -otu_map is Boolean. Setting to True
Found flag: -de_novo_otu
Previous flag: -de_novo_otu is Boolean. Setting to True
Found flag: -threads
Found value: 4 of previous flag: -threads
Found flag: -v
[opt_workdir:990] Using WORKDIR: "test_dir" as specified
[process:1456] Processing option: L with value: 24
[process:1456] Processing option: aligned with value:
[opt_aligned:238] Directory and Prefix for the aligned output was not provided. Using default dir/pfx: 'WORKDIR/out/aligned'
[process:1456] Processing option: blast with value: 1 cigar qcov
[process:1456] Processing option: de_novo_otu with value:
[process:1456] Processing option: idx-dir with value: index_dir
[opt_idxdir:1014] Using IDX dir: ["index_dir" as specified
[process:1456] Processing option: interval with value: 5
[process:1456] Processing option: max_pos with value: 500
[process:1456] Processing option: otu_map with value:
[process:1456] Processing option: reads with value: sample_R1.fastq
[opt_reads:97] Processing reads file [1] out of total [2] files
[process:1456] Processing option: reads with value: sample_R2.fastq
[opt_reads:97] Processing reads file [2] out of total [2] files
[process:1456] Processing option: ref with value: silva-bac-23s-id98.fasta
[opt_ref:157] Processing reference [1] out of total [6] references
[opt_ref:205] File "silva-bac-23s-id98.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-arc-16s-id95.fasta
[opt_ref:157] Processing reference [2] out of total [6] references
[opt_ref:205] File "silva-arc-16s-id95.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-bac-16s-id90.fasta
[opt_ref:157] Processing reference [3] out of total [6] references
[opt_ref:205] File "silva-bac-16s-id90.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-euk-18s-id95.fasta
[opt_ref:157] Processing reference [4] out of total [6] references
[opt_ref:205] File "silva-euk-18s-id95.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-euk-28s-id98.fasta
[opt_ref:157] Processing reference [5] out of total [6] references
[opt_ref:205] File "silva-euk-28s-id98.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-arc-23s-id98.fasta
[opt_ref:157] Processing reference [6] out of total [6] references
[opt_ref:205] File "silva-arc-23s-id98.fasta" exists and is readable
[process:1456] Processing option: threads with value: 4
[process:1456] Processing option: v with value:
[process:1476] === Options processing done ===
[process:1477] Alignment type: [best:1 num_alignments:1 min_lis:2 seeds:2]
[validate_kvdbdir:1223] Key-value DB location "test_dir/kvdb"
[validate_kvdbdir:1259] Creating KVDB directory: "test_dir/kvdb"
[validate_idxdir:1189] Using index directory: "index_dir"
[validate_idxdir:1205] IDX directory: "index_dir" exists and is not empty
[validate_readb_dir:1281] Using split reads directory : "test_dir/readb"
[validate_readb_dir:1289] Created split reads directory - OK
[validate_aligned_pfx:1310] Checking output directory: "test_dir/out"
[main:62] Running command:
sortmerna -ref silva-bac-23s-id98.fasta -ref silva-arc-16s-id95.fasta -ref silva-bac-16s-id90.fasta -ref silva-euk-18s-id95.fasta -ref silva-euk-28s-id98.fasta -ref silva-arc-23s-id98.fasta -reads sample_R1.fastq -reads sample_R2.fastq -workdir test_dir -idx-dir index_dir -L 24 -interval 5 -max_pos 500 -blast '1 cigar qcov' -aligned -otu_map -de_novo_otu -threads 4 -v
[Index:102] Found 24 non-empty index files. Skipping indexing.
[init:108] Readfeed init started

The process gets stuck at this stage and readb and out sub-directories remain empty. The same thing happens if the input read files are empty.

Any thoughts?

Thanks,
Hossein

@biocodz
Copy link
Collaborator

biocodz commented May 10, 2021

The program has trouble reading your files. How big are they i.e. what's the output of

ls -l sample_R1.fastq sample_R2.fastq
file sample_R1.fastq
file sample_R2.fastq
wc -l sample_R1.fastq
wc -l sample_R2.fastq

@HosseinAsghari
Copy link
Author

The files basically only contain the single read I posted in the issue.

ls -l sample_R1.fastq sample_R2.fastq

-rw-r----- 1 hossein hossein 92 May 10 09:46 sample_R1.fastq
-rw-r----- 1 hossein hossein 92 May 10 09:46 sample_R2.fastq

file sample_R1.fastq

sample_R1.fastq: ASCII text

file sample_R2.fastq

sample_R2.fastq: ASCII text

wc -l sample_R1.fastq

4 sample_R1.fastq

wc -l sample_R2.fastq

4 sample_R2.fastq

@biocodz
Copy link
Collaborator

biocodz commented May 10, 2021

could you try running with -threads 1 + -dbg-level 2
Also, I would recommend to switch to the new pre-release 4.3.3-pre
Meanwhile, I'll try to reproduce this

@HosseinAsghari
Copy link
Author

-threads 1 + -dbg-level 2 caused the same problem and produced the following log.
Besides, version 4.3.3-pre also got stuck at the same point with the same log message.

[process:1372] === Options processing starts ... ===
Found value: sortmerna
Found flag: -ref
Found value: silva-bac-23s-id98.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-arc-16s-id95.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-bac-16s-id90.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-euk-18s-id95.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-euk-28s-id98.fasta of previous flag: -ref
Found flag: -ref
Found value: silva-arc-23s-id98.fasta of previous flag: -ref
Found flag: -reads
Found value: sample_R1.fastq of previous flag: -reads
Found flag: -reads
Found value: sample_R2.fastq of previous flag: -reads
Found flag: -workdir
Found value: test_dir of previous flag: -workdir
Found flag: -idx-dir
Found value: index_dir of previous flag: -idx-dir
Found flag: -L
Found value: 24 of previous flag: -L
Found flag: -interval
Found value: 5 of previous flag: -interval
Found flag: -max_pos
Found value: 500 of previous flag: -max_pos
Found flag: -blast
Found value: 1 cigar qcov of previous flag: -blast
Found flag: -aligned
Previous flag: -aligned is Boolean. Setting to True
Found flag: -otu_map
Previous flag: -otu_map is Boolean. Setting to True
Found flag: -de_novo_otu
Previous flag: -de_novo_otu is Boolean. Setting to True
Found flag: -threads
Found value: 1 of previous flag: -threads
Found flag: -v
Previous flag: -v is Boolean. Setting to True
Found flag: -dbg-level
Found value: 2 of previous flag: -dbg-level
[opt_workdir:990] Using WORKDIR: "test_dir" as specified
[process:1456] Processing option: L with value: 24
[process:1456] Processing option: aligned with value:
[opt_aligned:238] Directory and Prefix for the aligned output was not provided. Using default dir/pfx: 'WORKDIR/out/aligned'
[process:1456] Processing option: blast with value: 1 cigar qcov
[process:1456] Processing option: dbg-level with value: 2
[process:1456] Processing option: de_novo_otu with value:
[process:1456] Processing option: idx-dir with value: index_dir/
[opt_idxdir:1014] Using IDX dir: ["index_dir/" as specified
[process:1456] Processing option: interval with value: 5
[process:1456] Processing option: max_pos with value: 500
[process:1456] Processing option: otu_map with value:
[process:1456] Processing option: reads with value: sample_R1.fastq
[opt_reads:97] Processing reads file [1] out of total [2] files
[process:1456] Processing option: reads with value: sample_R2.fastq
[opt_reads:97] Processing reads file [2] out of total [2] files
[process:1456] Processing option: ref with value: silva-bac-23s-id98.fasta
[opt_ref:157] Processing reference [1] out of total [6] references
[opt_ref:205] File "silva-bac-23s-id98.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-arc-16s-id95.fasta
[opt_ref:157] Processing reference [2] out of total [6] references
[opt_ref:205] File "silva-arc-16s-id95.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-bac-16s-id90.fasta
[opt_ref:157] Processing reference [3] out of total [6] references
[opt_ref:205] File "silva-bac-16s-id90.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-euk-18s-id95.fasta
[opt_ref:157] Processing reference [4] out of total [6] references
[opt_ref:205] File "silva-euk-18s-id95.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-euk-28s-id98.fasta
[opt_ref:157] Processing reference [5] out of total [6] references
[opt_ref:205] File "silva-euk-28s-id98.fasta" exists and is readable
[process:1456] Processing option: ref with value: silva-arc-23s-id98.fasta
[opt_ref:157] Processing reference [6] out of total [6] references
[opt_ref:205] File "silva-arc-23s-id98.fasta" exists and is readable
[process:1456] Processing option: threads with value: 1
[process:1456] Processing option: v with value:
[process:1476] === Options processing done ===
[process:1477] Alignment type: [best:1 num_alignments:1 min_lis:2 seeds:2]
[validate_kvdbdir:1223] Key-value DB location "test_dir/kvdb"
[validate_kvdbdir:1259] Creating KVDB directory: "test_dir/kvdb"
[validate_idxdir:1189] Using index directory: "index_dir/"
[validate_idxdir:1205] IDX directory: "index_dir/" exists and is not empty
[validate_readb_dir:1281] Using split reads directory : "test_dir/readb"
[validate_readb_dir:1289] Created split reads directory - OK
[validate_aligned_pfx:1310] Checking output directory: "test_dir/out"
[main:62] Running command:
sortmerna -ref silva-bac-23s-id98.fasta -ref silva-arc-16s-id95.fasta -ref silva-bac-16s-id90.fasta -ref silva-euk-18s-id95.fasta -ref silva-euk-28s-id98.fasta -ref silva-arc-23s-id98.fasta -reads sample_R1.fastq -reads sample_R2.fastq -workdir test_dir -idx-dir index_dir/ -L 24 -interval 5 -max_pos 500 -blast 1 cigar qcov -aligned -otu_map -de_novo_otu -threads 1 -v -dbg-level 2
[Index:93] Index file ["index_dir/17299952793705614139.bursttrie_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/17299952793705614139.pos_0.dat"] already exists and is not empty.
[Index:93] Index file ["index-dir/17299952793705614139.kmer_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/17299952793705614139.stats"] already exists and is not empty.
[Index:93] Index file ["index_dir/3436099190853847617.bursttrie_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/3436099190853847617.pos_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/3436099190853847617.kmer_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/3436099190853847617.stats"] already exists and is not empty.
[Index:93] Index file ["index_dir/15734375058464002811.bursttrie_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/15734375058464002811.pos_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/15734375058464002811.kmer_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/15734375058464002811.stats"] already exists and is not empty.
[Index:93] Index file ["index_dir/2700646386527218729.bursttrie_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/2700646386527218729.pos_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/2700646386527218729.kmer_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/2700646386527218729.stats"] already exists and is not empty.
[Index:93] Index file ["index_dir/1845323523482939374.bursttrie_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/1845323523482939374.pos_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/1845323523482939374.kmer_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/1845323523482939374.stats"] already exists and is not empty.
[Index:93] Index file ["index_dir/3400685301612210653.bursttrie_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/3400685301612210653.pos_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/3400685301612210653.kmer_0.dat"] already exists and is not empty.
[Index:93] Index file ["index_dir/3400685301612210653.stats"] already exists and is not empty.
[Index:102] Found 24 non-empty index files. Skipping indexing.
[Index:104] TODO: a better validation using an index descriptor to decide on indexing
[init:108] Readfeed init started

@biocodz
Copy link
Collaborator

biocodz commented May 11, 2021

I'm able to reproduce the bug. Working on... Should not take long.

@biocodz
Copy link
Collaborator

biocodz commented May 11, 2021

The bug was indeed caused by the size of the files. The logic that defines the input file format reads 100 bytes to test the bio format and the compression. Yours was 94 bytes, resulting in the undefined read state. I modified the logic to test the size prior reading.
Please, try the updated binaries in the 4.3.3 pre-release.

BTW, I'm curious who might be using the code in Vancouver. University?

@biocodz biocodz closed this as completed May 26, 2021
@biocodz
Copy link
Collaborator

biocodz commented May 26, 2021

closing for inactivity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants