Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMEBin comparison #286

Open
jsgounot opened this issue Jan 19, 2024 · 6 comments
Open

COMEBin comparison #286

jsgounot opened this issue Jan 19, 2024 · 6 comments

Comments

@jsgounot
Copy link

COMEBin, a new binning method has recently been published in Nature com. In the paper, Vamb performs relatively poorly compared to other binners in multiple situations, even compared to old ones such as MetaBat2. As Vamb is currently my default binner, I was curious to know what is your opinion on this matter, if you keep recommending using Vamb or Avamb, or if you see new software as better approaches.

@jakobnissen
Copy link
Member

jakobnissen commented Jan 19, 2024

Dear @jsgounot

We're looking into COMEBin now and trying to reproduce its results (but it might take some time as we are currently busy with other things).

Without speaking to COMEBin in particular, our general impression has been

  • Vamb was pretty good when it came out, but the original Vamb is now worse than SemiBin2 (and MetaDecoder, IIRC)
  • Vamb has improved since it came out, including Vamb's master branch being better than the released v.4.1.3
  • SemiBin2 is better than Vamb, because it uses a single-copy genes-based reclustering, which is efficient for bacteria (and bacteria only). It's even better than Vamb on the master branch
  • When applying SemiBin2's clustering to Vamb, Vamb becomes better than SemiBin2
  • We have an unreleased semi-supervised Vamb in the pipeline which is better still

So, of the currently released versions, SemiBin2 beats Vamb. The upcoming Vamb (version 5) will beat SemiBin2.

I expect COMEBin to at least beat Vamb 4, given that it too, uses single copy genes. However, let's see.

@simonrasmu
Copy link
Collaborator

Chiming in here regarding single-copy genes. When these are used in the model (Metadecoder) or in the clustering (SemiBin1/2, COMEBin) and afterwards used to estimate how good your model is (CheckM) then it is a circular loop. Ie. you are optimising your model using SCGs and then afterwards using SCGs to check how good you are.

@jsgounot
Copy link
Author

OK, thank you both for your honest answers. I know it's difficult to say, but do you have an estimate of version 5 release time?

@jakobnissen
Copy link
Member

Not really. Almost surely this year. We're working on a manuscript but as you know, these things are unpredictable

@jakobnissen
Copy link
Member

jakobnissen commented Feb 2, 2024

@jsgounot Okay, we've now done some benchmarking on synthetic benchmark datasets, and have found that

  • COMEBin is better than Vamb
  • This difference can be attributed purely to its use of single-copy genes. If Vamb uses SCG in a similar fashion, Vamb is ~1% better
  • SemiBin2 is ~2% better than either, meaning that they all perform quite similar and probably within measurement error of each other
  • Our upcoming version of Vamb is ~9% better, still only modestly better, but a much larger gap than between the others.

I should note that this is on CAMI2 synthetic datasets, using our own benchmarking suite (though to be fair, using AMBER, our upcoming version is 27%, 75% and 76% better than COMEBin, SemiBin2 and Vamb, so it's not just our method cherry-picking). We haven't yet compared COMEBin to other, real biological datasets, but based on these results, I would guess they would be in line with SemiBin2.

@jsgounot
Copy link
Author

jsgounot commented Feb 5, 2024

Thanks for the update @jakobnissen, really appreciate that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants