Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssu_erroneous issue #47

Closed
arghya1611 opened this issue Jun 13, 2020 · 7 comments
Closed

ssu_erroneous issue #47

arghya1611 opened this issue Jun 13, 2020 · 7 comments

Comments

@arghya1611
Copy link

Hi Donovan,

Am having an issue with the ssu_erroneous command. The output is given below. What seems o be the issue here?

(/data/Food/analysis/R0220_NIHAM/refinem) [arghya.mukherjee@compute10 bins-for-refinem]$ refinem ssu_erroneous -x fa -c 16 MG-taxon-filter-bins/ MG-taxon-profile/ gtdb_r80_ssu_db.2018-01-18.fna gtdb_r80_taxonomy.2017-12-15.tsv MG-ssu-erroneous
[2020-06-13 13:21:26] INFO: RefineM v0.1.1
[2020-06-13 13:21:26] INFO: refinem ssu_erroneous -x fa -c 16 MG-taxon-filter-bins/ MG-taxon-profile/ gtdb_r80_ssu_db.2018-01-18.fna gtdb_r80_taxonomy.2017-12-15.tsv MG-ssu-erroneous
[2020-06-13 13:21:27] INFO: Identifying SSU rRNA genes.
[2020-06-13 13:22:26] INFO: Extracting SSU rRNA genes.
[2020-06-13 13:22:26] INFO: Classifying SSU rRNA genes.
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
Warning: [blastn] Examining 5 or more matches is recommended
[2020-06-13 13:22:37] INFO: Identifying scaffolds with 16S rRNA genes with divergent taxonomic classification.

Unexpected error: <class 'KeyError'>
Traceback (most recent call last):
File "/data/Food/analysis/R0220_NIHAM/refinem/bin/refinem", line 398, in
parser.parse_options(args)
File "/data/Food/analysis/R0220_NIHAM/refinem/lib/python3.7/site-packages/refinem/main.py", line 687, in parse_options
self.ssu_erroneous(options)
File "/data/Food/analysis/R0220_NIHAM/refinem/lib/python3.7/site-packages/refinem/main.py", line 334, in ssu_erroneous
options.output_dir)
File "/data/Food/analysis/R0220_NIHAM/refinem/lib/python3.7/site-packages/refinem/ssu.py", line 537, in erroneous
if r not in common_taxa[gid]:
KeyError: 'MG-megahit-maxbin2-2000.291_sub'

Thanks!

@donovan-h-parks
Copy link
Owner

Do you have a genome with the name MG-megahit-maxbin2-2000.291_sub in the mix, or did this get filtered out at some point? RefineM isn't very robust in the sense that it expects the same set of bins to be used at every step, their filenames to be unchanged, and the name of all contigs/scaffolds to be unchanged.

@arghya1611
Copy link
Author

MG-megahit-maxbin2-2000.291_sub is a genome being worked on. While the base name remains same, the TNF/coverage and taxon/gene steps have added the filtered suffix. The genome also shows removal of a number of scaffolds in each step. The set of bins however remains same at each step.

Also, what are the blastn warnings about?

@donovan-h-parks
Copy link
Owner

The BLASTN warning is due to a recent change in BLAST that address an issue that came to light recently. I imagine this is explained in the BLAST release notes. It is expected and I don't expect it has much, if any, impact of the usage in RefineM.

I'm not sure what to recommend regarding the RefineM error. Are you following the workflow in the RefineM README file?

@donovan-h-parks
Copy link
Owner

If you are just exploring a single bin, you can extract the SSU sequences using CheckM ssu_finder and then BLAST these at NCBI or SILVA to manually see how these compare to the expected taxonomy of your bin.

@arghya1611
Copy link
Author

arghya1611 commented Jun 14, 2020

Thanks Donovan. Unfortunately currently am dealing with upwards of 500 genomes, So, it would have been really useful to have been able to use this script. I have tried the README as provided in a stepwise manner: TNF/coverage based refining -> taxon-profile/gene based refining/ssu based refining. I have used the pipeline before without issues on other bin sets. This time however, my output bins come from an aggregation step performed with DAS Tool. I don't know if that is creating a problem with the last ssu-erroneous script. Any way to make this work?

@donovan-h-parks
Copy link
Owner

Hi. Assuming you started RefineM after running DAS Tool, I think this should be fine. My best guess is that RefineM isn't too sophisticated in terms of parsing the filenames and my not like the period or underscore in the name MG-megahit-maxbin2-2000.291_sub. This is just a guess though, and I haven't had others raise this issue. Can you try a quick test using a genome without periods and underscores, and one with periods and underscores?

@arghya1611
Copy link
Author

Your suggestion worked! It seems that the ssu-erroneous script is having problems parsing the periods/ underscores etc. Maybe append a line in the README regarding this? Thanks for your prompt help. You can close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants