-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect number of genomes detected in overview of output #135
Comments
This issue has not seen activity for 14 days and has been marked as stale. Please comment with additional information if this issue is still relevant. |
Hi. The amount of genomes for that figure is calculated using the header of the gbk files (from the Organism property, if I remember correctly), so it's possible to have incorrect numbers depending on how these gbk files were produced |
Hi, If the genome name is not in the organism property of the gbk, the considered name will be the name of the gbk file (without "cluster" or "region"). this happens here: Lines 3271 to 3277 in 97d616c
If you are working with 3 clusters from the same genome ( I didn't have time to read the entire code, but I think you could adjust the name of your input before running bigscape, i.e. to include the genome name ( Here is a bash script to include the genome name in the cluster.gbk files and create a symbolic link in the directory where the script is executed (input directory of bigscape): #!/bin/bash
# Directory where the genome folders are located
genomes_dir="path_to_antiSMASH_output/"
# Loop through all genome folders
for genome_dir in "$genomes_dir"/*; do
# Extract the genome name from the folder
genome=$(basename "$genome_dir")
# Find all gbk files containing "region" in their name inside the genome folder
find "$genome_dir" -type f -name "*region*.gbk" | while read -r gbk_file; do
# Extract the file name without extension
filename=$(basename "$gbk_file" .gbk)
# Extract the region number from the file
region_number=$(echo "$filename" | grep -oP 'region\d+')
# Extract the contig number from the file
contig_number=$(echo "$filename" | grep -oP 'contig_\d+')
# Create the new file name
new_filename="${genome}.${region_number}.${contig_number}.gbk"
# Create the symbolic link with the new name in the current directory
ln -s "$gbk_file" "./$new_filename"
done
done best, |
Thanks Felipe! It's hard to make a one-size-fits all solution for cases like these, where there is missing data (here, e.g. someone having a selected set of gbk files from different genomes in a custom folder, or metagenomic datasets) |
An incorrect number of genomes seems to be captured in the index HTML output. Thus, this affects one of the pie charts that is generated. In one case, when using one assembled genome with 11 BGCs, the overview in the index.html file incorrectly said that 11 genomes were used.
In another case, I used 15 assembled genomes but the overview page in the index html file said 118. The input is as indicated in the tutorial.
The text was updated successfully, but these errors were encountered: