Discrepancies in multiple cutoff parameters results and networks #258

alpole23 · 2025-02-07T20:13:50Z

Dear developers,
It might just be my misunderstanding on how the cutoffs and mixed networking work with Bigscape 2, but I am confused about the differences I am seeing between the same dataset with different cutoffs.

My command is as follows:
bigscape cluster -i results/antismash -o results/bigscape -c 6 --pfam-path pfam/Pfam-A.hmm --mix --classify class --include-singletons --gcf-cutoffs 0.5,0.7 --mibig-version 3.1

I have included a screen capture of the run information:

The table below shows the total number of genomes and the total number of BGCs as predicted from antismash between the two cutoffs. Since I am using the same exact antismash dataset, shouldn't these numbers be the same?

	cutoff 0.5	cutoff 0.7
total # genomes	582	1330
total BGCs	3914	5884

I have also screen captured the "mix" network for each cutoff after selecting visualize all. Given the data in the table above, shouldn't I be expecting a network of 5884 BGCs for cutoff 0.7 and a mixed network of 3914 BGCs for cutoff 0.5?

Mixed Network for cutoff 0.7:

Mixed Network for cutoff 0.5:

Any information, suggestions, or insights into how to reconcile these data would be greatly appreciated. Thanks!

The text was updated successfully, but these errors were encountered:

nlouwen · 2025-02-10T12:54:28Z

Hi!

On the first question, the difference in numbers between cutoffs is likely due to the fact that reference- or mibig-only connected components are not included in the output. Since the lower cutoff will produce more mibig-only CCs, there is a lower number of BGCs remaining in the run. However, the reported number of genomes/BGCs is currently too high (roughly duplicated) when using mix and classify together, which will be fixed in our next release.

The screenshots of the mix networks indeed do not look like expected. I have not been able to reproduce this kind of result using the same command you've used, so I am not sure what could have caused this. To figure that out, I'd ideally have to take a look at the output folder if you could share that via e.g. google drive (assuming data is not private). Otherwise, you could try running a pure mix run --mix --classify none with a fresh output directory to see if the same discrepancy occurs.

alpole23 · 2025-02-14T21:24:55Z

Thanks for the feedback. Unfortunately, I cannot share the data publicly, but I will play around with the pure mix run with some datasets that I can publicize and see if I get the same unexpected network results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancies in multiple cutoff parameters results and networks #258

Discrepancies in multiple cutoff parameters results and networks #258

alpole23 commented Feb 7, 2025

nlouwen commented Feb 10, 2025

alpole23 commented Feb 14, 2025

Discrepancies in multiple cutoff parameters results and networks #258

Discrepancies in multiple cutoff parameters results and networks #258

Comments

alpole23 commented Feb 7, 2025

nlouwen commented Feb 10, 2025

alpole23 commented Feb 14, 2025