Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #130

Closed
meerveld96 opened this issue May 1, 2023 · 11 comments
Closed

IndexError: list index out of range #130

meerveld96 opened this issue May 1, 2023 · 11 comments

Comments

@meerveld96
Copy link

Hi,

I run ShortStack ( 4.0.1) with this command:
./ShortStack/ShortStack --genomefile ../G_pallida_Rookmaker_RH89-039-16_potato_genomes_combined.fasta --readfile Seresta_Rook_1_adapter_removed.fastq --knownRNAs ../forward_1/predicted_smallrnas.fasta --threads 100 --dn_mirna --dicermax 26 --outdir Seresta_Rook > Seresta_Rook.log

But I got this error, see below file:

error.txt

Thanks for helping me out.

Best regards,
Stefan

@MikeAxtell
Copy link
Owner

Can you:

  1. Run the test / example run detailed in the README, exactly as described. Does it work on your system?
  2. Send the full stdout/stderr from your failed run (what you posted was just a snippet). You can redirect stdout and stderr to a file, or run on a non-TTY, to not have to print the progress bar characters.

@meerveld96
Copy link
Author

  1. I was able to run the test / example run without no errors, please correct me if I'm wrong: alignment_details.txt
  2. Sorry, I forgot the log file from ShortStack itself: ShortStack.log

I was able to run ShortStack on an another sample: Complete_ShortStack.log

Please let me know if you need additional information.

@MikeAxtell
Copy link
Owner

Thanks, it's a bit of a puzzle. Your problematic run aborted at a step where it is parsing predicted RNA secondary structures. The specific failure is that it received an empty line from an RNAfold call where it should have retrieved a structure.

I noticed that you are using an extreme number of --threads. It's just a guess, but there might be some issues with communication across nodes (assumming you are using more than one node on a cluster if you are grabbing 100 threads!).

Can you try to restrict to a single node, and a more reasonable number of threads (say 10 or so?).

Another work around is to not perform MIRNA identification (omit the --knownRNAs option and do not set the --dn_mirna option). Although that is not great if you actually want ShortStack to annotate MIRNA loci for you.

If that fails, I will ask you to share your genome and fastq so I can try to reproduce the error on my end.

@meerveld96
Copy link
Author

meerveld96 commented May 2, 2023

Thanks for your suggestions, the reason why I did 100 threads is because of computational time, it took then already a working day to be finished. But I can first lower the amount of threads (to 5). For omitting the --knownRNAs option is not ideal in our situation.

@MikeAxtell
Copy link
Owner

MikeAxtell commented May 2, 2023 via email

@meerveld96
Copy link
Author

Ah thanks for the suggestion, for me to examined the candidates step takes the longest.

# reads processed: 9468
# reads with at least one alignment: 3649 (38.54%)
# reads that failed to align: 5819 (61.46%)
Reported 371793 alignments
[bam_sort_core] merging from 0 files and 5 in-memory blocks...
Candidates examined:  12%|██████▎                                                | 42775/371793 [4:00:33<138:31:45,  1.52s/it]

@MikeAxtell
Copy link
Owner

MikeAxtell commented May 2, 2023 via email

@meerveld96
Copy link
Author

The reference genome is distributed over 3078 scaffolds.
These consist of two genomes, one is a relatively good genome (54 scaffolds) and the other is a fragmented genome (the rest).

I first pooled all samples and did an initial ShortStack run to find small RNAs in general (they are not well known for these specific genomes), and gave the result (all sRNAs, because we are not only interested in miRNAs) as input for the --knownRNAs parameter per sample to the second ShortStack run which then failed with this IndexError.

It is of course possible that these sequences are very repetitive indeed, how to deal with this, do you have any advice? I understand that I can only give miRNAs to the knownRNAs parameter, but what about the rest of the smallRNAs we are interested in?
I also want to predict them as accurately as possible.

@MikeAxtell
Copy link
Owner

MikeAxtell commented May 2, 2023 via email

@meerveld96
Copy link
Author

Thanks, now I understand it better, I will try to feed ShortStack with known microRNAs from other closely related species or the one generated by ShortStack the first time when I pooled all the samples together, these are located in mir.fasta.

I do indeed work with plant material that has been infected with a pathogen.

Yes, I agree that changing the --known_RNAs parameter makes it more clear.

@MikeAxtell
Copy link
Owner

As of release 4.0.2 the option has been renamed to --known_miRNAs and the documentation improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants