-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hangs with no feedback #6
Comments
Tried it again with a slightly different (larger) test file. Again it seems to hang, but at a different point. I say 'hang' because although it hasn't thrown an error, there appears to be nothing happening (no tasks running, not memory being allocated)
Been sitting at this point for 2 hours, with no tasks being executed (as far as I can guess, from htop) |
Hi Darren, Thanks for reaching out, and I'm sorry that it's hanging on you. I'm working to figure out what's happening. Just to be sure that it's not an issue with your input fasta files (weird headers?), can you run the test contigs that are provided with the repo (e.g. testcontigs_DNA_ct2.fasta)? In the meantime I'll try to replicate this error. Mike |
Hi! Thanks for getting back to me so fast! I'm hoping that cenote-taker2 will revolutionize my workflow (or perhaps just replace a post-doc) My input is Trinity output from a few years ago ... [my understanding is that fasta makes no stipulation except that names start with a ">" followed by any characters at all, then a newline before sequence, and sequence continues until the next '>' ] The test file turns up a new error, suggesting a library problem. I'm using the supplied conda environment on a pretty clean new Linux install (scientific linux, a redhat derivative). I recently rean into this in another context - when I was trying to set up a conda environment for the newest Trinity and Samtools, and it took an age to resolve - possibly because of a version conflict?
|
Hmmm. OK, based on the Anvio issue you referenced, maybe something is bugging out with circlator, and you can try to reinstall it like this?
I sometimes regret having so many packages installed with Cenote-Taker 2 because if one of them breaks, the whole thing breaks. But I also didn't want to reinvent the wheel... On the other hand, it seems like the error regarding "line 547: s/#/ /g" is no longer occurring with the provided test contigs, making me believe that Cenote-Taker 2 was mishandling the fasta header from your original runs. Could you do me a big favor and send some of the fasta headers from these files:
|
Without fixing the libcrypto.so.1.0.0 problem, I have cleaned up my sequence titles (no funny characters at all!) and it hangs in the same place as before. It seems to die during
And this seems to be the last sequence it was looking at when it stops:
after successfully writing a blank file called "CleanWebster15.all_called_hmmscans.txt" I'm trying one on this sequence alone .... |
Part I - sequence namesThe old-style Trinity headers had a nasty '|' , but also '=' and '[' and ']' and ' '
but I've cleaned this to
Run on its own, the sequence above is OK, so maybe that wasn't the cause ... Part II, Circulatorlooks promising:
but no,
|
OK, I believe I've figured out at least one issue. Thank you for bearing with me here. The circlator issue may actually be a pysam issue per: this issue Can you check your pysam version (should be 0.15.3) and update if necessary
The other issue may have to do with a problem on my end that I've possibly fixed. The trinity headers were not the issue. You've got RNA virus contig(s) where the whole contig is covered by an ORF that may not have a start and stop codon. I had incorrectly coded prodigal to use Let me know if this helps. |
Hi! Fantastic, thank you. The pysam was indeed the issue, and the test file now runs happily! My own trial dataset (with the long ORF that lacks a start of stop, and the nasty headers) now runs to completion! But there are still some things that worry me ...: This still happens:
And when running blastn, what do lines like this imply?
Is it just a virus / phage not in nt? Then I get some hits that report like this:
What's the cause of this? Then at the end I get a lot of this:
What does this indicate? Thanks! Darren |
Darren, I again thank you for raising these issues, and I apologize that my testing wasn't as thorough as I thought. please do I fixed the error with this As you thought, Regard the blast reports, you have the phylogeny of the top hit on the first line, then the description of the top 3 hits. The description of the top hit is also in the note in the ".gbf" and ".fsa" files in the sequin_and_genome_maps directory. I don't really know exactly what users want to do with BLASTN info. What are your thoughts? Should it inform taxonomy in the output? I also fixed the error with My other question is, I know your lab has found some interesting segmented RNA viruses. You could of course use Cenote Taker 2 with |
Hi! Thank you for the pipeline! I have played around with several virus finders, and I have never previously found one that I thought worked well enough to use. I'm thinking we might start to use this routinely - so you're going to have to keep maintaining it!
So, as you might imagine, I have some opinions to share! I think this blastn screen (I'm using nt at the moment) is really useful, but I think you should make more use of it for the taxonomy. It looks like your taxonomy might be based on refseq? For viruses refseq is always so out of date as to relatively little use for spotting 'known' viruses. I think that, where the blastn is currently reported, it could be done more cleanly- purely as taxonomic information. So, leaving out the gene/segment etc etc and just report the top hit with "Sequence identity 98% to ". This would be a really clear sign that the user might consider it a previously reported virus, or not (they can choose the threshold). I think this should be in the all the outputs it can be, including the overall summary table. In fact, if you have a 90% plus blastn hit over the whole length, I would replace any proposed taxonomy based on more sophisticated approaches. Even better than the HSP identity would be a quick pairwise alignment between the new contig and its top blastn hit, and report the overall sequence identity for the shared length.
I think this would be great! I think genbank file could literally just be concatenated, as could gtf files to go with fsa files. I don't know if its too ugly, but folders could be created to hold the un-concatenated versions - then the concatenated file names could match the folders I have a number of other questions / suggestions. Would you like them here, or by email? |
Thanks for the feedback. Let's discuss further by email, and I'll make sure to include any changes that get made into the change log for the next update. |
Hi,
I'm playing with Cenote-Taker2 for the first time, and (as far as I can tell) it keeps hanging: i.e. simply stopping execution with no feedback or continued output or execution. There are a couple of errors thrown, but no indication as to what might cause them or what the solution might be.
The command looks like this
python ~/apps/CenoteTaker2/run_cenote-taker2.py -c LongWebster.fasta --known_strains blast_knowns --blastn_db /data/BLAST_databases/nt -r WebsterMelRebuild -m 150 -t 40 -p False
/data/home/dobbard/apps/CenoteTaker2
#and things start well
######################################################################
###################################################################################
But the failed awk and the failed cat suggest something is going wrong. At this point it appears nothing is running, so I am suspicious that cat is attempting to read from stdin because there was no file?
also, the missing file requested in line 547
doesn't bode well.
The text was updated successfully, but these errors were encountered: