"Can't find a matching hmm library in the database!" when running --query mode from antiSMASH v6 results #67

jolespin · 2023-03-24T22:12:26Z

I'm running version: v1.1.1 . This is issue is an extension of #66

(VEBA-biosynthetic_env) [jespinoz@exp-15-15 Test]$ bigslice --query  test_output/biosynthetic/intermediate/1__antismash/SRR17458614__CONCOCT__P.2__9/ --query_name SRR17458614__CONCOCT__P.2__9 --program_db_folder /expanse/projects/jcl110/db/veba/VDB_v4/Annotate/BiG-SLiCE/ bigslice_output
pid 1058573's current affinity list: 110
pid 1058573's new affinity list: 110
pid 1058570's current affinity list: 110
pid 1058570's new affinity list: 110
Fetching run details...
Can't find a matching hmm library in the database!
BiG-SLiCE run failed.

Attaching my antiSMASH v6 directory.

SRR17458614__CONCOCT__P.2__9.zip

The text was updated successfully, but these errors were encountered:

jolespin · 2023-04-15T20:58:15Z

Just checking in about this. Do you have any suggestions?

sunitj · 2023-08-11T23:41:22Z

bump
@jolespin did you ever figure out the issue?

jolespin · 2023-08-11T23:42:16Z

No I haven't been able to get past it so I haven't been able to use to package. I've posted my data if you want to try it out.

PannyYi · 2024-04-15T03:14:46Z

@jolespin did you ever figure out the issue? I encounter the issue too.

brilliant2643 · 2024-05-22T03:54:28Z

I encounter the same issue, anybody know how to solve this problem?

I used the latest software (v2.0.0) and the latest database for bigslice, I tried to put all database files into one directory but it didn't work.
And my command is: bigslice --query test/ --n_rank 1 output/ --program_db_folder databases/bigslice-db/bigslice-models -t 20, I also tried: bigslice --query test/ --n_rank 1 test/output/ --program_db_folder databases/bigslice-db/bigslice-models -t 20, but none of them worked.

jolespin · 2024-05-22T04:12:18Z

@brilliant2643 @PannyYi I was going to incorporate it into my VEBA package but wasn't ever able to fix this issue. I gave up on this software in the interim but look forward to using it/incorporating once this issue is resolved.

I developed some de novo clustering for antiSMASH BGCs in the VEBA biosynthetic module. Some of the scripts are standalone too.

You can use it like:

# Syntax 1
biosynthetic.py --from_antismash /path/to/antismash_parent_directory -o veba_output/biosynthetic

# Syntax 2
veba --module biosynthetic --params "--from_antismash /path/to/antismash_parent_directory -o veba_output/biosynthetic"

The /path/to/antismash_parent_directory directory looks like this:

antismash_output/
antismash_output/genome_1/[antismash_results_gbk_files]
antismash_output/genome_2/[antismash_results_gbk_files]
antismash_output/genome_.../[antismash_results_gbk_files]
antismash_output/genome_n/[antismash_results_gbk_files]

You can also run antiSMASH with it instead of providing antiSMASH results (check the docs). Not sure if it works with your results but if it does, you will end up with the following files:

bgc_clusters.tsv - BGC to BGC nucleotide cluster
bgc_protocluster-types.tsv.gz - Summary of BGCs detected organized by type. Also includes summary of BGCs that are NOT on contig edge.
bgcs.representative_sequences.fasta.gz - Full length BGC nucleotide cluster representatives
component_clusters.tsv - BGC protein to BGC protein cluster
components.representative_sequences.faa.gz - BGC protein cluster representatives
fasta/[id/_genome].faa/fasta.gz - BGC sequences in protein and nucleotide space
genbanks/[id_genome]/*.gbk - Genbank formatted antiSMASH results
homology.tsv.gz - Diamond results for MIBiG and VFDB
identifier_mapping.bgcs.tsv.gz - All of the BGCs in tabular format organized by genome, contig, region, and gene.
identifier_mapping.components.tsv.gz - All of the BGC components (i.e., genes in BGC) in tabular format organized by genome, contig, region, and gene.
krona.html - HTML showing Krona plot for number of BGCs per protocluster-type.
krona.tsv - Data to produce Krona plot
prevalence_tables/bgcs.tsv.gz - Genome vs. BGC nucleotide cluster prevalence table
prevalence_tables/components.tsv.gz - Genome vs. BGC protein cluster prevalence table

I typically use the prevalence_tables/components.tsv.gz with Jaccard distance and hierarchical clustering depending on how many BGCs I have and if there's too many then I'll use another clustering algo that supports boolean distance metrics like Jaccard.

Hope this helps. If you want to read more about the methodology check out the preprint on bioRxiv. Peer reviewed paper should be coming out soon. In the final stages of review right now.

If you want something similar to BIG-SLICE then I think you can use the BIRCH algo in scikit-learn but I'm not sure exactly how the backend is implemented.

brilliant2643 · 2024-05-22T06:28:15Z

@brilliant2643 @PannyYi I was going to incorporate it into my VEBA package but wasn't ever able to fix this issue. I gave up on this software in the interim but look forward to using it/incorporating once this issue is resolved.

I developed some de novo clustering for antiSMASH BGCs in the VEBA biosynthetic module. Some of the scripts are standalone too.

You can use it like:
# Syntax 1
biosynthetic.py --from_antismash /path/to/antismash_parent_directory -o veba_output/biosynthetic

# Syntax 2
veba --module biosynthetic --params "--from_antismash /path/to/antismash_parent_directory -o veba_output/biosynthetic"
The /path/to/antismash_parent_directory directory looks like this:
antismash_output/
antismash_output/genome_1/[antismash_results_gbk_files]
antismash_output/genome_2/[antismash_results_gbk_files]
antismash_output/genome_.../[antismash_results_gbk_files]
antismash_output/genome_n/[antismash_results_gbk_files]
You can also run antiSMASH with it instead of providing antiSMASH results (check the docs). Not sure if it works with your results but if it does, you will end up with the following files:

bgc_clusters.tsv - BGC to BGC nucleotide cluster

bgc_protocluster-types.tsv.gz - Summary of BGCs detected organized by type. Also includes summary of BGCs that are NOT on contig edge.

bgcs.representative_sequences.fasta.gz - Full length BGC nucleotide cluster representatives

component_clusters.tsv - BGC protein to BGC protein cluster

components.representative_sequences.faa.gz - BGC protein cluster representatives

fasta/[id/_genome].faa/fasta.gz - BGC sequences in protein and nucleotide space

genbanks/[id_genome]/*.gbk - Genbank formatted antiSMASH results

homology.tsv.gz - Diamond results for MIBiG and VFDB

identifier_mapping.bgcs.tsv.gz - All of the BGCs in tabular format organized by genome, contig, region, and gene.

identifier_mapping.components.tsv.gz - All of the BGC components (i.e., genes in BGC) in tabular format organized by genome, contig, region, and gene.

krona.html - HTML showing Krona plot for number of BGCs per protocluster-type.

krona.tsv - Data to produce Krona plot

prevalence_tables/bgcs.tsv.gz - Genome vs. BGC nucleotide cluster prevalence table

prevalence_tables/components.tsv.gz - Genome vs. BGC protein cluster prevalence table

I typically use the prevalence_tables/components.tsv.gz with Jaccard distance and hierarchical clustering depending on how many BGCs I have and if there's too many then I'll use another clustering algo that supports boolean distance metrics like Jaccard.

Hope this helps. If you want to read more about the methodology check out the preprint on bioRxiv. Peer reviewed paper should be coming out soon. In the final stages of review right now.

If you want something similar to BIG-SLICE then I think you can use the BIRCH algo in scikit-learn but I'm not sure exactly how the backend is implemented.

Thanks a lot ! I will try it later!

shuklagyanesh21 · 2024-11-01T22:24:00Z

bigslice --query input_bigslice/dataset_1 --query_name scp_87 --n_ranks 7 --program_db_folder ~/Tools/anaconda3/bin/bigslice-models/ ./output_folder

still gives the same error

Can't find a matching hmm library in the database!
BiG-SLiCE run failed.

Anyone could figure out the reason?

ZhangZF1102 · 2024-11-14T12:25:46Z

All right, in many attempts, I found that this issue can be solved by using BigSlice V1.1

ZhangZF1102 · 2024-11-20T09:56:21Z

All right, in many attempts, I found that this issue can be solved by using BigSlice V1.1

While, it should be notice that, BigSlice V1.1 can only recognize the results of Antismash V6

PannyYi mentioned this issue May 11, 2024

How do I extract similar BGC distance values from the interactive output #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Can't find a matching hmm library in the database!" when running --query mode from antiSMASH v6 results #67

"Can't find a matching hmm library in the database!" when running --query mode from antiSMASH v6 results #67

jolespin commented Mar 24, 2023

jolespin commented Apr 15, 2023

sunitj commented Aug 11, 2023

jolespin commented Aug 11, 2023

PannyYi commented Apr 15, 2024

brilliant2643 commented May 22, 2024

jolespin commented May 22, 2024 •

edited

Loading

brilliant2643 commented May 22, 2024

shuklagyanesh21 commented Nov 1, 2024 •

edited

Loading

ZhangZF1102 commented Nov 14, 2024

ZhangZF1102 commented Nov 20, 2024

"Can't find a matching hmm library in the database!" when running --query mode from antiSMASH v6 results #67

"Can't find a matching hmm library in the database!" when running --query mode from antiSMASH v6 results #67

Comments

jolespin commented Mar 24, 2023

jolespin commented Apr 15, 2023

sunitj commented Aug 11, 2023

jolespin commented Aug 11, 2023

PannyYi commented Apr 15, 2024

brilliant2643 commented May 22, 2024

jolespin commented May 22, 2024 • edited Loading

brilliant2643 commented May 22, 2024

shuklagyanesh21 commented Nov 1, 2024 • edited Loading

ZhangZF1102 commented Nov 14, 2024

ZhangZF1102 commented Nov 20, 2024

jolespin commented May 22, 2024 •

edited

Loading

shuklagyanesh21 commented Nov 1, 2024 •

edited

Loading