You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that this actually can cause us to download the wrong genome (v1 instead of v2) if that accession is not suppressed. I found this was the case for test accession GCA_000961135.2
- fixes#27
- allows `ftp_path` input that would be gotten from assembly_summary file
- makes input csv format more restrictive: requires column names ["accession", "name", "ftp_path"], though `ftp_path` entries are optional. `ftp_path` is the `ftp_path` that can be obtained from NCBI assembly summary files
When we try to find the link to download
GCA_000193795.2
,directsketch
found the following link: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/193/795/GCA_000193795.1_ASM19379v1/GCA_000193795.1_ASM19379v1_genomic.fna.gz. Note that the folder https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/193/795/GCA_000193795.1_ASM19379v1/ does indeed exist, but this.1
assembly is suppressed, so the download fails.When I looked up the genome via NCBI, I found the v2 genome is available at : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/193/795/GCA_000193795.2_ASM19379v2/GCA_000193795.2_ASM19379v2_genomic.fna.gz
so this script found the v1 folder, instead of the v2 folder, causing the download to fail.
To fix this, look into the link + version check here: https://github.com/sourmash-bio/sourmash_plugin_directsketch/blob/main/src/directsketch.rs#L106-L125
The text was updated successfully, but these errors were encountered: