Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c_to_p predicts "p.?" rather than "p.Met1?" for "NM_206926.1:c.1A>G" #651

Closed
holtgrewe opened this issue Apr 6, 2023 · 7 comments
Closed

Comments

@holtgrewe
Copy link
Contributor

holtgrewe commented Apr 6, 2023

Environment

export UTA_DB_URL="postgresql://uta:uta@localhost:5432/uta/uta_20210129"

The following code

import hgvs.assemblymapper
import hgvs.dataproviders.uta
import hgvs.parser
import hgvs.variantmapper

hdp = hgvs.dataproviders.uta.connect(mode="learn", cache=CACHE)
am = hgvs.assemblymapper.AssemblyMapper(hdp)
am37 = hgvs.assemblymapper.AssemblyMapper(hdp, assembly_name="GRCh37")
hp = hgvs.parser.Parser()


hgvs_c = "NM_000348.3:c.88del"
hgvs_g = "NC_000001.10:g.26126722A>G"

var_g = hp.parse_hgvs_variant(hgvs_g)
var_c = am37.g_to_c(var_g, "NM_206926.1")
var_p = am37.c_to_p(var_c)

print("var_g = {}, var_c = {}, var_p = {}", var_g, var_c, var_p)

Will print

var_g = {}, var_c = {}, var_p = {} NC_000001.10:g.26126722A>G NM_206926.1:c.1A>G NP_996809.1:p.?

However, the VariantValidator correctly gives:

Reference Sequence Type Variant Description
Transcript NM_206926.2:c.1A>G
Protein single letter code NP_996809.1:p.(M1?)
Protein three letter code NP_996809.1:p.(Met1?)

Note that the vvhgvs fork also predicts as p.?. I rather assume that the problem is fixed somewhere around here in openvar/variantValidator vvMixinInit.py.

CC @Peter-J-Freeman

@holtgrewe
Copy link
Contributor Author

holtgrewe commented Apr 6, 2023

I think the problem is that the code becomes very conservative in the case that the transcript has multiple stop codons and is considered as ambiguous after here:

is_ambiguous=self._ref_has_multiple_stops,

@holtgrewe
Copy link
Contributor Author

It might make sense to drop this (IMO overly) conservative behaviour. VariantValidator is also affected for NC_000001.10:g.26136244G>A which corresponds to NP_065184.2:p.Gly315Ser as confirmed by VEP and is a probably deleterious variant.

image

I have carried over almost all tests cases from the Python library to hgvs-rs and this did not change any test results except for the NC_000001.10:g.26136244G>A one.

@holtgrewe
Copy link
Contributor Author

OK, there is nothing like running the ClinVar variant set through your software to see the corner cases. Case in point is GRCh37:1:26142208:A:G for ENST00000374315. The following is the VEP result.

1343697 1:26142208 G ENSG00000162430 ENST00000374315 Transcript stop_lost 1708 1670 557 */W tAg/tGg - IMPACT=HIGH;STRAND=1;SOURCE=Ensembl;GIVEN_REF=A;USED_REF=A;HGVSc=ENST00000374315.1:c.1670A>G;HGVSp=ENSP00000363434.1:p.Ter557TrpextTer?

I realize that it may not be possible to perform correct predictions for transcripts with multiple stop codons, but perhaps the code can be extended such that it use the conservative "is ambiguous" case only for changes after the first stop codon?

@Peter-J-Freeman
Copy link

Peter-J-Freeman commented Apr 12, 2023

@holtgrewe

In VariantValidator, I have added a lot of code that deals with fundamentals of the variant nomenclature including the use of the Met1 syntax. Much of the additional code is outside the scope of hgvs, so the VV code is very much complementary.

I am trying very hard to re-hash VariantValidator so that it can run on the latest hgvs library versions so that our users can benefit from updates and modifications. Thanks very much for ccing me in. I am pulling the issue into our issues as well so I can keep track. I am in communication with the biocommons team r.e. how to re-introduce the latest hgvs builds into vv.

@larrybabb
Copy link

The HGVS specification prefers p.? when there is not 100% certainty that there is a protein change actually occurs (it's expected) as in this the case of the substitution you provided.

@larrybabb
Copy link

@reece please help me help you (need rights to admin these tickets).

@katiestahl
Copy link
Contributor

closing, as this is working as designed according to @larrybabb and @reece

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants