VFDB contains few genes that are not part of any cluster #331

PovilasMat · 2023-02-10T15:08:07Z

Hi,

ariba was running into weird issue while running on vf database:
[E::hts_idx_push] Unsorted positions on sequence # 1: 109 followed by 11
OSError: building of index for /scratch/shadow/tmpr7wt7j_c/ariba_virulencefinder/ariba_virulencefinder/read_store.gz failed

I figured that it was because read_store.gz is incorrectly sorted because one of the genes doesnt have cluster information. I changed read_store.py to sort correctly even with cluster information missing but then it failed in future step:
_init_and_run_clusters reference_names=self.cluster_ids[cluster_name],
KeyError: ''

Obviously, because cluster name was missing. :)

Then I started digging around and made this small test:

mkdir vftest
cd vftest
ariba getref virulencefinder out.virulencefinder
ariba prepareref -f out.virulencefinder.fa -m out.virulencefinder.tsv ./test
cd test
cat 02.cdhit.clusters.tsv | awk '{$1="";print}' | tr " " "\n" | sort | uniq > cluster_file
grep ">" 02.cdhit.all.fa | sed 's/>//g' | sort > all_file
wc -l all_file
wc -l cluster_file
diff cluster_file all_file

Output of the last three lines:

5558 all_file
5554 cluster_file //cluster file contains one empty line in the beginning
1d0 //this is the empty line
< //this is the empty line
718a718
> csnA_4_KJ922517
973a974
> eltIIAB_c8_1_AASRQF010000005
4943a4945
> stx2_122_CP022279_122
5082a5085
> stx2b_O128_24196_97_95_AJ567995_95
5157a5161
> stx2h_O102_STEC299_122_CP022279_122

So the issue is because one or more of those 5 genes (in my case stx2h_O102_STEC299_122_CP022279_122) can be found in my sequencing reads but they are not part of any cluster. Whenever read_store is made, they do not contain any cluster name which fails the script.

ariba version
ARIBA version: 2.14.6
External dependencies:
bowtie2 2.2.5 /srv/data/tools/anaconda3/envs/env_cge_update/bin/bowtie2
cdhit 4.8.1 /srv/data/tools/anaconda3/envs/env_cge_update/bin/cd-hit-est
nucmer 3.1 /srv/data/tools/anaconda3/envs/env_cge_update/bin/nucmer
spades 3.15.5 /srv/data/tools/anaconda3/envs/env_cge_update/bin/spades.py
External dependencies OK: True
Python version:
3.9.15 | packaged by conda-forge | (main, Nov 22 2022, 08:45:29)
[GCC 10.4.0]
Python packages:
ariba 2.14.6 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/ariba/init.py
bs4 4.11.1 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/bs4/init.py
dendropy 4.5.2 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/dendropy/init.py
pyfastaq 3.17.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pyfastaq/init.py
pymummer 0.11.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pymummer/init.py
pysam 0.18.0 /srv/data/tools/anaconda3/envs/env_cge_update/lib/python3.9/site-packages/pysam/init.py
Python packages OK: True
Everything looks OK: True

etuduri · 2023-07-13T18:58:11Z

Hi, I have the same issue, please help!!

ARIBA version: 2.14.6

External dependencies:
bowtie2 2.3.4.1 /usr/bin/bowtie2
cdhit 4.7 /usr/bin/cd-hit-est
nucmer 3.1 /usr/bin/nucmer
spades 3.13.0 /home/inei/SPAdes-3.13.0-Linux/bin/spades.py

External dependencies OK: True

Python version:
3.6.9 (default, Mar 10 2023, 16:46:00)
[GCC 8.4.0]

Python packages:
ariba 2.14.6 /usr/local/lib/python3.6/dist-packages/ariba/init.py
bs4 4.9.2 /home/inei/.local/lib/python3.6/site-packages/bs4/init.py
dendropy 4.4.0 /home/inei/.local/lib/python3.6/site-packages/dendropy/init.py
pyfastaq 3.17.0 /home/inei/.local/lib/python3.6/site-packages/pyfastaq/init.py
pymummer 0.10.3 /home/inei/.local/lib/python3.6/site-packages/pymummer/init.py
pysam 0.16.0.1 /home/inei/.local/lib/python3.6/site-packages/pysam/init.py

Python packages OK: True

Everything looks OK: True

Thanks in advance !!!

PovilasMat · 2023-07-14T00:11:31Z

It doesnt seem like ariba will receive any future changes. I requested DB maintainers to fix it on their end. But it is still ongoing process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VFDB contains few genes that are not part of any cluster #331

VFDB contains few genes that are not part of any cluster #331

PovilasMat commented Feb 10, 2023 •

edited

Loading

etuduri commented Jul 13, 2023

PovilasMat commented Jul 14, 2023

VFDB contains few genes that are not part of any cluster #331

VFDB contains few genes that are not part of any cluster #331

Comments

PovilasMat commented Feb 10, 2023 • edited Loading

etuduri commented Jul 13, 2023

PovilasMat commented Jul 14, 2023

PovilasMat commented Feb 10, 2023 •

edited

Loading