IndexError: list index out of range #130

meerveld96 · 2023-05-01T08:56:10Z

Hi,

I run ShortStack ( 4.0.1) with this command:
./ShortStack/ShortStack --genomefile ../G_pallida_Rookmaker_RH89-039-16_potato_genomes_combined.fasta --readfile Seresta_Rook_1_adapter_removed.fastq --knownRNAs ../forward_1/predicted_smallrnas.fasta --threads 100 --dn_mirna --dicermax 26 --outdir Seresta_Rook > Seresta_Rook.log

But I got this error, see below file:

error.txt

Thanks for helping me out.

Best regards,
Stefan

The text was updated successfully, but these errors were encountered:

MikeAxtell · 2023-05-01T12:12:21Z

Can you:

Run the test / example run detailed in the README, exactly as described. Does it work on your system?
Send the full stdout/stderr from your failed run (what you posted was just a snippet). You can redirect stdout and stderr to a file, or run on a non-TTY, to not have to print the progress bar characters.

meerveld96 · 2023-05-01T14:07:25Z

I was able to run the test / example run without no errors, please correct me if I'm wrong: alignment_details.txt
Sorry, I forgot the log file from ShortStack itself: ShortStack.log

I was able to run ShortStack on an another sample: Complete_ShortStack.log

Please let me know if you need additional information.

MikeAxtell · 2023-05-01T20:27:10Z

Thanks, it's a bit of a puzzle. Your problematic run aborted at a step where it is parsing predicted RNA secondary structures. The specific failure is that it received an empty line from an RNAfold call where it should have retrieved a structure.

I noticed that you are using an extreme number of --threads. It's just a guess, but there might be some issues with communication across nodes (assumming you are using more than one node on a cluster if you are grabbing 100 threads!).

Can you try to restrict to a single node, and a more reasonable number of threads (say 10 or so?).

Another work around is to not perform MIRNA identification (omit the --knownRNAs option and do not set the --dn_mirna option). Although that is not great if you actually want ShortStack to annotate MIRNA loci for you.

If that fails, I will ask you to share your genome and fastq so I can try to reproduce the error on my end.

meerveld96 · 2023-05-02T06:49:21Z

Thanks for your suggestions, the reason why I did 100 threads is because of computational time, it took then already a working day to be finished. But I can first lower the amount of threads (to 5). For omitting the --knownRNAs option is not ideal in our situation.

MikeAxtell · 2023-05-02T11:12:56Z

ShortStack's read alignment phase is the most time / cpu-intensive, because of the treatment of multi-mapping reads. In your case, your initial run should have completed read alignment successfully. You can retrieve the .bam file (and its index) that was made in the failed run, and use it as input to the --bamfile option when testing out. That will save some time. From: meerveld96 ***@***.***> Date: Tuesday, May 2, 2023 at 2:49 AM To: MikeAxtell/ShortStack ***@***.***> Cc: Axtell, Michael ***@***.***>, Comment ***@***.***> Subject: Re: [MikeAxtell/ShortStack] IndexError: list index out of range (Issue #130) Thanks for your suggestions, the reason why I did 100 threads is because of computational time, it took then already a working day to be finished. But I can first lower the amount of threads. For omitting the --knownRNAs option is not ideal in our situation. — Reply to this email directly, view it on GitHub<#130 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCN7SZCRRPLSLAVGXODXECU7XANCNFSM6AAAAAAXRSNV7U>. You are receiving this because you commented.Message ID: ***@***.***>

meerveld96 · 2023-05-02T11:33:02Z

Ah thanks for the suggestion, for me to examined the candidates step takes the longest.

# reads processed: 9468
# reads with at least one alignment: 3649 (38.54%)
# reads that failed to align: 5819 (61.46%)
Reported 371793 alignments
[bam_sort_core] merging from 0 files and 5 in-memory blocks...
Candidates examined:  12%|██████▎                                                | 42775/371793 [4:00:33<138:31:45,  1.52s/it]

MikeAxtell · 2023-05-02T12:25:25Z

Is your reference genome highly fragmented -- in 100s or 1000s of contigs/scaffolds? Slowness at this step could be due, in part, to a poorly assembled genome. Also, what is the source of your "knownRNAs"? Some of them must be highly repetitive .. I see you have 3649 of them aligned, but in total there are ~372 thousand hits. The "knownRNAs" are meant to be known microRNA sequences only. Part of the slowness is that you are searching many highly repetitive hits for microRNA-like characteristics. Consider trimming your "knownRNAs" file to include just mature miRNA sequences known from your species or a closely related species. From: meerveld96 ***@***.***> Date: Tuesday, May 2, 2023 at 7:33 AM To: MikeAxtell/ShortStack ***@***.***> Cc: Axtell, Michael ***@***.***>, Comment ***@***.***> Subject: Re: [MikeAxtell/ShortStack] IndexError: list index out of range (Issue #130) Ah thanks for the suggestion, for me to examined the candidates step takes the longest. # reads processed: 9468 # reads with at least one alignment: 3649 (38.54%) # reads that failed to align: 5819 (61.46%) Reported 371793 alignments [bam_sort_core] merging from 0 files and 5 in-memory blocks... Candidates examined: 12%|██████▎ | 42775/371793 [4:00:33<138:31:45, 1.52s/it] — Reply to this email directly, view it on GitHub<#130 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCM2JV6NPZVV3PNUNALXEDWHTANCNFSM6AAAAAAXRSNV7U>. You are receiving this because you commented.Message ID: ***@***.***>

meerveld96 · 2023-05-02T13:05:34Z

The reference genome is distributed over 3078 scaffolds.
These consist of two genomes, one is a relatively good genome (54 scaffolds) and the other is a fragmented genome (the rest).

I first pooled all samples and did an initial ShortStack run to find small RNAs in general (they are not well known for these specific genomes), and gave the result (all sRNAs, because we are not only interested in miRNAs) as input for the --knownRNAs parameter per sample to the second ShortStack run which then failed with this IndexError.

It is of course possible that these sequences are very repetitive indeed, how to deal with this, do you have any advice? I understand that I can only give miRNAs to the knownRNAs parameter, but what about the rest of the smallRNAs we are interested in?
I also want to predict them as accurately as possible.

MikeAxtell · 2023-05-02T13:17:29Z

I suggest using only known microRNAs from other closely related species as input to the 'knownRNAs' option. You can also enable the 'dn_mirna' switch to turn on de novo microRNA searches. For "small RNAs in general", ShortStack finds all clusters in the genome where the sRNA abundance exceeds the mincov threshold. These will all be reported. Most expressed small RNAs, especially in plants, are siRNAs, not microRNAs. ShortStack will report the most abundant single RNA from each of these loci (in the Results.txt file). The 'knownRNAs' option specifies known mature microRNA matches in the reference genome where ShortStack will look "hard" to check for the MIRNA criteria. I wish I had given the 'knownRNAs' option a different name, like 'known_micrornas', to make this more clear. From: meerveld96 ***@***.***> Date: Tuesday, May 2, 2023 at 9:05 AM To: MikeAxtell/ShortStack ***@***.***> Cc: Axtell, Michael ***@***.***>, Comment ***@***.***> Subject: Re: [MikeAxtell/ShortStack] IndexError: list index out of range (Issue #130) The reference genome is distributed over 3078 scaffolds. These consist of two genomes, one is a relatively good genome (54 scaffolds) and the other is a fragmented genome (the rest). I first pooled all samples and did an initial ShortStack run to find small RNAs in general (they are not well known for these specific genomes), and gave the result (all sRNAs, because we are not only interested in miRNAs) as input for the --knownRNAs parameter per sample to the second ShortStack run which then failed with this IndexError. It is of course possible that these sequences are very repetitive indeed, how to deal with this, do you have any advice? I understand that I can only give miRNAs to the knownRNAs parameter, but what about the rest of the smallRNAs we are interested in? I also want to predict them as accurately as possible. — Reply to this email directly, view it on GitHub<#130 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCL2K5PDIJHLFBCPM2DXEEBCVANCNFSM6AAAAAAXRSNV7U>. You are receiving this because you commented.Message ID: ***@***.***>

meerveld96 · 2023-05-02T13:57:09Z

Thanks, now I understand it better, I will try to feed ShortStack with known microRNAs from other closely related species or the one generated by ShortStack the first time when I pooled all the samples together, these are located in mir.fasta.

I do indeed work with plant material that has been infected with a pathogen.

Yes, I agree that changing the --known_RNAs parameter makes it more clear.

MikeAxtell · 2023-05-12T15:49:17Z

As of release 4.0.2 the option has been renamed to --known_miRNAs and the documentation improved.

MikeAxtell self-assigned this May 9, 2023

MikeAxtell added documentation enhancement labels May 9, 2023

MikeAxtell closed this as completed May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: list index out of range #130

IndexError: list index out of range #130

meerveld96 commented May 1, 2023

MikeAxtell commented May 1, 2023

meerveld96 commented May 1, 2023

MikeAxtell commented May 1, 2023

meerveld96 commented May 2, 2023 •

edited

Loading

MikeAxtell commented May 2, 2023 via email

meerveld96 commented May 2, 2023

MikeAxtell commented May 2, 2023 via email

meerveld96 commented May 2, 2023

MikeAxtell commented May 2, 2023 via email

meerveld96 commented May 2, 2023

MikeAxtell commented May 12, 2023

IndexError: list index out of range #130

IndexError: list index out of range #130

Comments

meerveld96 commented May 1, 2023

MikeAxtell commented May 1, 2023

meerveld96 commented May 1, 2023

MikeAxtell commented May 1, 2023

meerveld96 commented May 2, 2023 • edited Loading

MikeAxtell commented May 2, 2023 via email

meerveld96 commented May 2, 2023

MikeAxtell commented May 2, 2023 via email

meerveld96 commented May 2, 2023

MikeAxtell commented May 2, 2023 via email

meerveld96 commented May 2, 2023

MikeAxtell commented May 12, 2023

meerveld96 commented May 2, 2023 •

edited

Loading