-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: list index out of range #130
Comments
Can you:
|
I was able to run ShortStack on an another sample: Complete_ShortStack.log Please let me know if you need additional information. |
Thanks, it's a bit of a puzzle. Your problematic run aborted at a step where it is parsing predicted RNA secondary structures. The specific failure is that it received an empty line from an RNAfold call where it should have retrieved a structure. I noticed that you are using an extreme number of --threads. It's just a guess, but there might be some issues with communication across nodes (assumming you are using more than one node on a cluster if you are grabbing 100 threads!). Can you try to restrict to a single node, and a more reasonable number of threads (say 10 or so?). Another work around is to not perform MIRNA identification (omit the If that fails, I will ask you to share your genome and fastq so I can try to reproduce the error on my end. |
Thanks for your suggestions, the reason why I did 100 threads is because of computational time, it took then already a working day to be finished. But I can first lower the amount of threads (to 5). For omitting the |
ShortStack's read alignment phase is the most time / cpu-intensive, because of the treatment of multi-mapping reads. In your case, your initial run should have completed read alignment successfully. You can retrieve the .bam file (and its index) that was made in the failed run, and use it as input to the --bamfile option when testing out. That will save some time.
From: meerveld96 ***@***.***>
Date: Tuesday, May 2, 2023 at 2:49 AM
To: MikeAxtell/ShortStack ***@***.***>
Cc: Axtell, Michael ***@***.***>, Comment ***@***.***>
Subject: Re: [MikeAxtell/ShortStack] IndexError: list index out of range (Issue #130)
Thanks for your suggestions, the reason why I did 100 threads is because of computational time, it took then already a working day to be finished. But I can first lower the amount of threads. For omitting the --knownRNAs option is not ideal in our situation.
—
Reply to this email directly, view it on GitHub<#130 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCN7SZCRRPLSLAVGXODXECU7XANCNFSM6AAAAAAXRSNV7U>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Ah thanks for the suggestion, for me to examined the candidates step takes the longest.
|
Is your reference genome highly fragmented -- in 100s or 1000s of contigs/scaffolds? Slowness at this step could be due, in part, to a poorly assembled genome.
Also, what is the source of your "knownRNAs"? Some of them must be highly repetitive .. I see you have 3649 of them aligned, but in total there are ~372 thousand hits. The "knownRNAs" are meant to be known microRNA sequences only. Part of the slowness is that you are searching many highly repetitive hits for microRNA-like characteristics. Consider trimming your "knownRNAs" file to include just mature miRNA sequences known from your species or a closely related species.
From: meerveld96 ***@***.***>
Date: Tuesday, May 2, 2023 at 7:33 AM
To: MikeAxtell/ShortStack ***@***.***>
Cc: Axtell, Michael ***@***.***>, Comment ***@***.***>
Subject: Re: [MikeAxtell/ShortStack] IndexError: list index out of range (Issue #130)
Ah thanks for the suggestion, for me to examined the candidates step takes the longest.
# reads processed: 9468
# reads with at least one alignment: 3649 (38.54%)
# reads that failed to align: 5819 (61.46%)
Reported 371793 alignments
[bam_sort_core] merging from 0 files and 5 in-memory blocks...
Candidates examined: 12%|██████▎ | 42775/371793 [4:00:33<138:31:45, 1.52s/it]
—
Reply to this email directly, view it on GitHub<#130 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCM2JV6NPZVV3PNUNALXEDWHTANCNFSM6AAAAAAXRSNV7U>.
You are receiving this because you commented.Message ID: ***@***.***>
|
The reference genome is distributed over 3078 scaffolds. I first pooled all samples and did an initial ShortStack run to find small RNAs in general (they are not well known for these specific genomes), and gave the result (all sRNAs, because we are not only interested in miRNAs) as input for the It is of course possible that these sequences are very repetitive indeed, how to deal with this, do you have any advice? I understand that I can only give miRNAs to the knownRNAs parameter, but what about the rest of the smallRNAs we are interested in? |
I suggest using only known microRNAs from other closely related species as input to the 'knownRNAs' option. You can also enable the 'dn_mirna' switch to turn on de novo microRNA searches.
For "small RNAs in general", ShortStack finds all clusters in the genome where the sRNA abundance exceeds the mincov threshold. These will all be reported. Most expressed small RNAs, especially in plants, are siRNAs, not microRNAs. ShortStack will report the most abundant single RNA from each of these loci (in the Results.txt file).
The 'knownRNAs' option specifies known mature microRNA matches in the reference genome where ShortStack will look "hard" to check for the MIRNA criteria.
I wish I had given the 'knownRNAs' option a different name, like 'known_micrornas', to make this more clear.
From: meerveld96 ***@***.***>
Date: Tuesday, May 2, 2023 at 9:05 AM
To: MikeAxtell/ShortStack ***@***.***>
Cc: Axtell, Michael ***@***.***>, Comment ***@***.***>
Subject: Re: [MikeAxtell/ShortStack] IndexError: list index out of range (Issue #130)
The reference genome is distributed over 3078 scaffolds.
These consist of two genomes, one is a relatively good genome (54 scaffolds) and the other is a fragmented genome (the rest).
I first pooled all samples and did an initial ShortStack run to find small RNAs in general (they are not well known for these specific genomes), and gave the result (all sRNAs, because we are not only interested in miRNAs) as input for the --knownRNAs parameter per sample to the second ShortStack run which then failed with this IndexError.
It is of course possible that these sequences are very repetitive indeed, how to deal with this, do you have any advice? I understand that I can only give miRNAs to the knownRNAs parameter, but what about the rest of the smallRNAs we are interested in?
I also want to predict them as accurately as possible.
—
Reply to this email directly, view it on GitHub<#130 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCL2K5PDIJHLFBCPM2DXEEBCVANCNFSM6AAAAAAXRSNV7U>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Thanks, now I understand it better, I will try to feed ShortStack with known microRNAs from other closely related species or the one generated by ShortStack the first time when I pooled all the samples together, these are located in I do indeed work with plant material that has been infected with a pathogen. Yes, I agree that changing the |
As of release 4.0.2 the option has been renamed to |
Hi,
I run ShortStack ( 4.0.1) with this command:
./ShortStack/ShortStack --genomefile ../G_pallida_Rookmaker_RH89-039-16_potato_genomes_combined.fasta --readfile Seresta_Rook_1_adapter_removed.fastq --knownRNAs ../forward_1/predicted_smallrnas.fasta --threads 100 --dn_mirna --dicermax 26 --outdir Seresta_Rook > Seresta_Rook.log
But I got this error, see below file:
error.txt
Thanks for helping me out.
Best regards,
Stefan
The text was updated successfully, but these errors were encountered: