Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Genes in GeneSummaries File #1154

Open
nlfortier opened this issue Nov 15, 2024 · 1 comment
Open

Missing Genes in GeneSummaries File #1154

nlfortier opened this issue Nov 15, 2024 · 1 comment

Comments

@nlfortier
Copy link

Over the past few months many genes have been dropped from the GeneSummaries.tsv file on the CIViC Data Releases page.

The following genes were in the September release, but were missing from the October release:
BEND2, CBFA2T3, CBFB, CREB3L1, CREB3L2, DDIT3, DEK, DGKH, DUX4, FLI1, FUS, GLI1, HMGA2, IL2RB, MAML2, MAP3K8, MNX1, NCOA2, NFATC2, NUP214, NUP98, NUTM1, PDGFD, PRKACA, PTK2B, SH3PXD2A, SSX1, SSX2, SSX4, TLX3, WWTR1, ZFTA, ZNF384

Five additional genes were subsequently dropped in the November release:
KAT6A, RASGRF1, RBM15, VGLL2, YWHAE

These genes can still be looked up using the website which leads me to suspect that these genes were removed erroneously.

@acoffman
Copy link
Member

Thanks for the report! On first glance, it appears to me that this is related to the rollout of our new Fusion data model. Spot checking several of the Genes you listed, I see that they no longer have any variants associated with them.

For instance the Gene RASGRF1 has no direct variants under it any more, but there are new Fusion Features (OCLN::RASGRF1, IQGAP1::RASGRF1, SLC4A4::RASGRF1) which have associated variants and evidence items.

The default behavior of the TSV exports is to only export Genes that have at least one variant, molecular profile, and evidence item associated with it, but in these cases it appears all of the associated variants were in fact fusion variants that have since been moved.

We probably need to introduce a new FeatureSummaries.tsv file that includes all feature types (Genes, Fusions, and Factors) so that it can be comprehensive. We may also be able to introduce some heuristic to include Genes in the GeneSummaries.tsv that have curated summaries, sources, etc, or that are included in Fusion Features. We will get a fix out for this in the next release and I'll follow up here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants