Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong prediction of Shigella Boydii serotype 20 #6

Open
wolthuisr opened this issue Jan 25, 2022 · 8 comments
Open

Wrong prediction of Shigella Boydii serotype 20 #6

wolthuisr opened this issue Jan 25, 2022 · 8 comments

Comments

@wolthuisr
Copy link
Collaborator

Hi,

I am using the ShigaTyper tool to analyze multiple shigella subtypes. One of the subtypes I am interested in is Shigella Boydii serotype 20. I noticed that if there is a heparinase hit the tool is supposed to return Shigella boydii serotype 20(line 591).

We looked at the samples manually and matched the gene sequence to the sample sequence, as expected the gene is within the samples, but the ShigaTyper script does not seem to recognize these hits and instead identifies the samples as Shigella boydii serotype 1.

There might be more users that will get this wrong prediction so I was wondering if there is an explanation for this and whether it can be fixed.

Looking forward to a response!

Kind regards,
Roxanne

@florathecat
Copy link
Collaborator

florathecat commented Jan 25, 2022 via email

@wolthuisr
Copy link
Collaborator Author

Hi Yun,

We used some public samples with accession numbers SRR3020611 & SRR5330512 (ENA). For these samples we don't find results on the Heparinase gene.

Hope this could help explain the issue!

Roxanne

@florathecat
Copy link
Collaborator

florathecat commented Jan 27, 2022 via email

@rpetit3
Copy link
Contributor

rpetit3 commented Feb 11, 2022

Should a reference gene for heparinase be in https://github.com/CFSAN-Biostatistics/shigatyper/blob/master/shigatyper/resources/ShigellaRef5.fasta?

If so there isn't one, and likely the cause of this issue

@rpetit3
Copy link
Contributor

rpetit3 commented Feb 11, 2022

Did some testing.

Without heparinase in reference fasta

sample  prediction      ipaB
SRX1486859      Shigella boydii serotype 1      -

added this sequence for heparinase (https://www.ncbi.nlm.nih.gov/nuccore/CP016036.1?from=2803&to=4428&report=fasta&strand=2) to the ShigellaRef5.fasta file, and it gets serotype 20

sample  prediction      ipaB
SRX1486859      Shigella boydii serotype 20     -

@rpetit3
Copy link
Contributor

rpetit3 commented Feb 11, 2022

Haha final comment.

Comparing the genes in ShigellaRef5.fasta and Table 2 in the paper (https://journals.asm.org/doi/10.1128/AEM.00165-19), the following sequences are in Table 2 and not ShigellaRef5.fasta

Heparinase
Sat_N
ShET1
ShET2
Stx1
Stx2

Of these I think only Heparinase is used by ShigaTyper

@rpetit3
Copy link
Contributor

rpetit3 commented Mar 25, 2022

Hi @wolthuisr

This should be fixed in v2 of Shigatyper.

Cheers

@florathecat
Copy link
Collaborator

Haha final comment.

Comparing the genes in ShigellaRef5.fasta and Table 2 in the paper (https://journals.asm.org/doi/10.1128/AEM.00165-19), the following sequences are in Table 2 and not ShigellaRef5.fasta

Heparinase
Sat_N
ShET1
ShET2
Stx1
Stx2

Of these I think only Heparinase is used by ShigaTyper

I see that the current version of shigatyper does not contain shigatoxins and enterotoxins like the later version I included in the paper. It was primarily because the output I originally envisioned using ipython/Jupyter notebook is different what most people prefer in a server environment. So the current shigatyper only gives you a single output of a serotype. (And we debated over whether we should included heparinase for S. boydii 20 in the paper or for another paper). I am not as code-savvy as the CFSAN guys or most ppl on Github. Please let me know how helpful/informative if the script output includes another column for toxins identified?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants