Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ragtag correct stop with minimap2 error message #17

Closed
rderelle opened this issue Aug 11, 2020 · 10 comments
Closed

ragtag correct stop with minimap2 error message #17

rderelle opened this issue Aug 11, 2020 · 10 comments
Labels
bug Something isn't working

Comments

@rderelle
Copy link

Hello,

many thanks for developing this tool (it's incredibly easy to install).

I encountered an issue when trying to correct an Illumina assembly called 'C1_k105_scaffolds.fasta' using forward Illumina reads (from a paired-end library; file 'C1_1_trim.fq.gz') and a reference assembly of a closely related species (file '1-Genome_assembly.fa'), with ragtag exiting with an error message.

The ragtag log is:

Tue Aug 11 12:18:36 2020 --- RagTag v1.0.0
Tue Aug 11 12:18:36 2020 --- CMD: /xxx/anaconda3/bin/ragtag_correct.py 1-Genome_assembly.fa C1_k105_scaffolds.fasta -t 2 -T sr -R /xxx/sp/01_cleaned/C1_1_trim.fq.gz -u -o C1_correct --gff 2-Genome_GFF3.gff3.txt
Tue Aug 11 12:18:36 2020 --- Mapping the query genome to the reference genome
Tue Aug 11 12:18:36 2020 --- Running: minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf.log
Tue Aug 11 12:18:45 2020 --- Finished running : minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct/c_query_against_ref.paf.log
Tue Aug 11 12:18:45 2020 --- Reading whole genome alignments
Tue Aug 11 12:18:46 2020 --- Filtering and merging alignments
Tue Aug 11 12:18:47 2020 --- Validating putative query breakpoints via read alignment.
Tue Aug 11 12:18:47 2020 --- Aligning reads to query sequences.
Tue Aug 11 12:18:47 2020 --- Running: minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam.log
Traceback (most recent call last):
File "/xxx/anaconda3/bin/ragtag_correct.py", line 645, in
main()
File "/xxx/anaconda3/bin/ragtag_correct.py", line 591, in main
al.run_aligner()
File "/xxx/anaconda3/lib/python3.6/site-packages/ragtag_utilities/Aligner.py", line 128, in run_aligner
run_oe(self.compile_command(), self.out_file, self.out_log)
File "/xxx/anaconda3/lib/python3.6/site-packages/ragtag_utilities/utilities.py", line 73, in run_oe
raise RuntimeError('Failed : %s > %s 2> %s' % (" ".join(cmd), out, err))
RuntimeError: Failed : minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam.log

and the sam log (file 'c_reads_against_query.sam.log') is:

[M::mm_idx_gen::3.1521.30] collected minimizers
[M::mm_idx_gen::3.831
1.42] sorted minimizers
[M::main::3.8381.42] loaded/built the index for 27604 target sequence(s)
[M::mm_mapopt_update::3.838
1.42] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 27604
[M::mm_idx_stat::4.034*1.40] distinct minimizers: 16518461 (95.55% are singletons); average occurrences: 1.091; average spacing: 6.034

Would you have any idea where the error comes from?
My apologises if this issue is trivial, I do not have much experience with ragtag or minimap2.

many thanks
Romain

@malonge
Copy link
Owner

malonge commented Aug 11, 2020

Hi there,

Thanks for bringing this up. I am not sure what went wrong since the log doesn't indicate any error. I suggest you run minimap2 outside of ragtag and maybe that will give some indication of what is going on. Just run the following command:

minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct/c_reads_against_query.sam.log

Let me know if it fails again or if it runs to completion.

Thanks

@rderelle
Copy link
Author

Hi,

thanks!
You were right: I've tried with another version of minmap2 installed on our cluster and it works (v2.13 vs v2.15 in my first attempt).
nb: not saying here that v2.15 wouldn't work with ragtag.

I have then tried to run 'ragtag correct' without the .gff file and it worked again.

Then I I've tried to run it with the 2 read files I have (forward and reverse reads; as opposed to only the forward reads in my previous attempts) using the -F option pointing to a txt file containing the names of my 2 fastq files but it stopped halfway:

Tue Aug 11 16:00:28 2020 --- RagTag v1.0.0
Tue Aug 11 16:00:28 2020 --- CMD: /xxx/anaconda3/bin/ragtag_correct.py 1-Genome_assembly.fa C1_k105_scaffolds.fasta -t 2 -T sr -F list_files.txt -u -o C1_correct_no_gff_and_list
Tue Aug 11 16:00:28 2020 --- Mapping the query genome to the reference genome
Tue Aug 11 16:00:28 2020 --- Running: minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf.log
Tue Aug 11 16:00:42 2020 --- Finished running : minimap2 -x asm5 -t 2 /xxx/sp/03_RagTag/1-Genome_assembly.fa /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_query_against_ref.paf.log
Tue Aug 11 16:00:42 2020 --- Reading whole genome alignments
Tue Aug 11 16:00:43 2020 --- Filtering and merging alignments
Tue Aug 11 16:00:43 2020 --- Validating putative query breakpoints via read alignment.
Tue Aug 11 16:00:43 2020 --- Aligning reads to query sequences.
Tue Aug 11 16:00:43 2020 --- Running: minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz /xxx/sp/01_cleaned/C1_2_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam.log
Tue Aug 11 16:00:48 2020 --- Finished running : minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz /xxx/sp/01_cleaned/C1_2_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam.log
Tue Aug 11 16:00:48 2020 --- Compressing, sorting, and indexing read alignments
Tue Aug 11 16:00:48 2020 --- Indexing read alignments
Tue Aug 11 16:00:48 2020 --- Validating putative query breakpoints
Tue Aug 11 16:00:48 2020 --- Calculating global read coverage
Traceback (most recent call last):
File "/xxx/anaconda3/bin/ragtag_correct.py", line 645, in
main()
File "/xxx/anaconda3/bin/ragtag_correct.py", line 610, in main
ctg_breaks = validate_breaks(ctg_breaks, output_path, num_threads, overwrite_files, val_min_break_end_dist, max_cov, min_cov, window_size=val_window_size, clean_dist=min_break_dist, debug=debug_mode)
File "/xxx/anaconda3/bin/ragtag_correct.py", line 168, in validate_breaks
glob_med = get_median_read_coverage(output_path, num_threads, overwrite_files)
File "/xxx/anaconda3/bin/ragtag_correct.py", line 124, in get_median_read_coverage
raise ValueError()
ValueError

Indeed in this case the output file 'c_reads_against_query.s.bam.stats' does not have any line starting from 'COV' ... hence the error message I believe.

Perhaps am I not using courtly the option -F ?

best,
Romain

@rderelle
Copy link
Author

looking at the output file 'c_reads_against_query.sam.log' (almost empty), I can see that no read has been mapped.

@malonge
Copy link
Owner

malonge commented Aug 11, 2020

Hi there,

Thanks for these details. Unfortunately, this doesn't appear to be a problem with RagTag, but rather with minimap2. As with the first example, the best way to debug is to run the aligner and see why it is not producing alignments. So you could rerun the following:

minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/01_cleaned/C1_1_trim.fq.gz /xxx/sp/01_cleaned/C1_2_trim.fq.gz > /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam 2> /xxx/sp/03_RagTag/C1_correct_no_gff_and_list/c_reads_against_query.sam.log

As far as RagTag is concerned, that is a valid minimap2 command.

Let me put it another way: RagTag reports all of the alignment commands used (like the one above). If they fail for some reason, the best way to debug is to run the same command outside of ragtag and reproduce the error. At the end of the day, RagTag will not work if minimap2 isn't working. For example, if minimap2 runs out of memory, then one must focus of resolving that issue with minimap2.

That said, I think RagTag can do a much better job of reporting errors. The value error raised here needs more information. And perhaps it can check if alignment files are empty in order to provide a more useful error message.

Anyways let me know if you can reproduce the error by running minimap2 outside of ragtag.

EDIT
I think RagTag does a good job of reporting when aligner jobs have just failed. But if they fail silently (like producing empty alignment files), RagTag isn't good at reporting those errors.

@rderelle
Copy link
Author

Thanks for the reply.

However the minimap2 command line works perfectly -> reads from both files are mapped to the genome.

I tried again to run ragtag after modifying the txt file containing the 2 names files (giving relative paths instead of absolute paths) but again it didn't work.
I'll try to go deeper into this issue but it seems 'somehow' to be a ragtag issue rather than a minimap2 issue.

@rderelle
Copy link
Author

From what I understand, the problem seems to come from the way ragtag calls minimap2 with 2 read files, probably in the class Minimap2SAMAligner (Aligner.py file) but I have not detected any mistake.

to recapitulate my observations:

_ when running ragtag withe the -F option pointing to a a text file containing the path of the forward and reverse reads, the minimap2 command is correct (it works when running it) but strangely it doesn't work properly and no read is map.

_ from the file 'c_reads_against_query.sam.log':

[M::mm_idx_gen::3.0561.36] collected minimizers
[M::mm_idx_gen::3.741
1.47] sorted minimizers
[M::main::3.7451.47] loaded/built the index for 27604 target sequence(s)
[M::mm_mapopt_update::3.745
1.47] mid_occ = 1000
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 27604
[M::mm_idx_stat::3.938*1.45] distinct minimizers: 16518461 (95.55% are singletons); average occurrences: 1.091; average spacing: 6.034
ERROR: failed to open file '/xxx/sp/03_RagTag/C1_1_trim.fq.gz /xxx/sp/03_RagTag/C2_1_trim.fq.gz'
[M::main] Version: 2.13-r850
[M::main] CMD: minimap2 -ax sr -t 2 /xxx/sp/03_RagTag/C1_k105_scaffolds.fasta /xxx/sp/03_RagTag/C1_1_trim.fq.gz /xxx/sp/03_RagTag/C2_1_trim.fq.gz
[M::main] Real time: 3.959 sec; CPU: 5.723 sec; Peak RSS: 0.774 GB

It looks as if ragtag is giving minimap2 a single filename consisting on the concatenation of the 2 filenames. Not sure I'm interpreting this correctly (??).

@malonge
Copy link
Owner

malonge commented Aug 11, 2020

Hi there,

Great - this log does indicate a RagTag bug. I think your interpretation is correct. I will look into it and fix this bug in the next patch.

This also reinforces the need for better error reporting. That may be a little more nuanced but I will look into it.

As for the first issue, I think I will still assume that there is no bug there.

In the meanwhile, if you are eager for results, you can run minimap2 outside of ragtag (like you have done) and just name the alignments with the expected ragtag name and put them in the output directory. RagTag won't try to overwrite preexisting alignment files, so it should work fine. Let me know if you need more details.

@malonge malonge added the bug Something isn't working label Aug 11, 2020
@malonge
Copy link
Owner

malonge commented Aug 11, 2020

I think the problem is here:

RagTag/ragtag_correct.py

Lines 584 to 585 in 5df41f1

al = Minimap2SAMAligner(query_file, " ".join(read_files), read_aligner_path, "-ax sr -t " + str(num_threads),
output_path + "c_reads_against_query", in_overwrite=overwrite_files)

I try to join the two file names with a space when really they should be separate elements in a list.

@rderelle
Copy link
Author

I agree, the problem seems to come from this class object.

Thanks for the tip, it works when running minimap2 before ragtag.

@malonge
Copy link
Owner

malonge commented Aug 17, 2020

Fixed in v1.0.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants