Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low quality genomes end up as seperate species. #22

Open
SilasK opened this issue Oct 7, 2022 · 0 comments
Open

Low quality genomes end up as seperate species. #22

SilasK opened this issue Oct 7, 2022 · 0 comments

Comments

@SilasK
Copy link

SilasK commented Oct 7, 2022

Low-quality genomes cluster apart. I used galah on large datasets. What I noticed is that often a low complete genomes are selected as separate species, which are then annotated by GTDB-tk as the same species as other high-quality genomes.

I have the impression that if a genome has low completeness it will not pass the min coverage of FastANI and so yield no FatANI report. I don't know how you do this internally but I imagine they perturb the clustering.

Would the solution simply be to use a very low --min-aligned-fraction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant