-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ssu_erroneous issue #47
Comments
Do you have a genome with the name MG-megahit-maxbin2-2000.291_sub in the mix, or did this get filtered out at some point? RefineM isn't very robust in the sense that it expects the same set of bins to be used at every step, their filenames to be unchanged, and the name of all contigs/scaffolds to be unchanged. |
MG-megahit-maxbin2-2000.291_sub is a genome being worked on. While the base name remains same, the TNF/coverage and taxon/gene steps have added the filtered suffix. The genome also shows removal of a number of scaffolds in each step. The set of bins however remains same at each step. Also, what are the blastn warnings about? |
The BLASTN warning is due to a recent change in BLAST that address an issue that came to light recently. I imagine this is explained in the BLAST release notes. It is expected and I don't expect it has much, if any, impact of the usage in RefineM. I'm not sure what to recommend regarding the RefineM error. Are you following the workflow in the RefineM README file? |
If you are just exploring a single bin, you can extract the SSU sequences using |
Thanks Donovan. Unfortunately currently am dealing with upwards of 500 genomes, So, it would have been really useful to have been able to use this script. I have tried the README as provided in a stepwise manner: TNF/coverage based refining -> taxon-profile/gene based refining/ssu based refining. I have used the pipeline before without issues on other bin sets. This time however, my output bins come from an aggregation step performed with DAS Tool. I don't know if that is creating a problem with the last ssu-erroneous script. Any way to make this work? |
Hi. Assuming you started RefineM after running DAS Tool, I think this should be fine. My best guess is that RefineM isn't too sophisticated in terms of parsing the filenames and my not like the period or underscore in the name |
Your suggestion worked! It seems that the ssu-erroneous script is having problems parsing the periods/ underscores etc. Maybe append a line in the README regarding this? Thanks for your prompt help. You can close the issue. |
Hi Donovan,
Am having an issue with the ssu_erroneous command. The output is given below. What seems o be the issue here?
(/data/Food/analysis/R0220_NIHAM/refinem) [arghya.mukherjee@compute10 bins-for-refinem]$ refinem ssu_erroneous -x fa -c 16 MG-taxon-filter-bins/ MG-taxon-profile/ gtdb_r80_ssu_db.2018-01-18.fna gtdb_r80_taxonomy.2017-12-15.tsv MG-ssu-erroneous
[2020-06-13 13:21:26] INFO: RefineM v0.1.1
[2020-06-13 13:21:26] INFO: refinem ssu_erroneous -x fa -c 16 MG-taxon-filter-bins/ MG-taxon-profile/ gtdb_r80_ssu_db.2018-01-18.fna gtdb_r80_taxonomy.2017-12-15.tsv MG-ssu-erroneous
[2020-06-13 13:21:27] INFO: Identifying SSU rRNA genes.
[2020-06-13 13:22:26] INFO: Extracting SSU rRNA genes.
[2020-06-13 13:22:26] INFO: Classifying SSU rRNA genes.
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
[2020-06-13 13:22:37] INFO: Identifying scaffolds with 16S rRNA genes with divergent taxonomic classification.
Unexpected error: <class 'KeyError'>
Traceback (most recent call last):
File "/data/Food/analysis/R0220_NIHAM/refinem/bin/refinem", line 398, in
parser.parse_options(args)
File "/data/Food/analysis/R0220_NIHAM/refinem/lib/python3.7/site-packages/refinem/main.py", line 687, in parse_options
self.ssu_erroneous(options)
File "/data/Food/analysis/R0220_NIHAM/refinem/lib/python3.7/site-packages/refinem/main.py", line 334, in ssu_erroneous
options.output_dir)
File "/data/Food/analysis/R0220_NIHAM/refinem/lib/python3.7/site-packages/refinem/ssu.py", line 537, in erroneous
if r not in common_taxa[gid]:
KeyError: 'MG-megahit-maxbin2-2000.291_sub'
Thanks!
The text was updated successfully, but these errors were encountered: