Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutated Genes table does not calculate frequencies correctly #5613

Closed
schultzn opened this issue Jan 16, 2019 · 25 comments · Fixed by #6093
Closed

Mutated Genes table does not calculate frequencies correctly #5613

schultzn opened this issue Jan 16, 2019 · 25 comments · Fixed by #6093

Comments

@schultzn
Copy link
Contributor

I see this issue in the AACR GENIE study when selecting samples that have mutations in multiple different genes. I see frequencies greater than 100% sometimes.

I think this happens when selecting genes that are only covered in some samples.

image

@zhx828
Copy link
Member

zhx828 commented Jan 17, 2019

@ersinciftci @onursumer could one of you take a look this ticket? If you go to genie portal, select NUTM1, ZFHX4 from Mutated Genes table, then click Select Sample you will see the endpoint returns frequency: 121.57 for NUTM1.

@jjgao jjgao added the API label Jan 17, 2019
@zhx828
Copy link
Member

zhx828 commented Jan 17, 2019

@schultzn @jjgao
We found the issue. Use NUTM1 as example, NUTM1 is showing up in DFCI-ONCOPANEL-3, UCSF-NIMV4, WAKE-CA-NGSQ3 three panels (http://www.cbioportal.org/genie/study?id=genie_public&filters={%22mutatedGenes%22:[{%22entrezGeneIds%22:[256646]}],%22studyIds%22:[%22genie_public%22]})

But actually NUTM1 is not listed under panel DFCI-ONCOPANEL-3 . The reason it's showing up in the study view is because it has fusion data which in current way of importing fusion, we duplicate the fusion partners.

Study view uses the # of samples in panel as denominator for the frequency, but getting all # of mutations including duplicated fusions.

You can see the same thing in oncoprint http://www.cbioportal.org/genie/results/oncoprint?session_id=5c410d51e4b05228701fbd2e

@schultzn
Copy link
Contributor Author

schultzn commented Jan 17, 2019 via email

@jjgao
Copy link
Member

jjgao commented Jan 18, 2019

Separate fusions from mutation would fix the issue. It'll take some time before we separate them completely.

For the moment, maybe we can create a Fusion Genes table in study view by using all the fusions in mutations table and for Mutated Genes table, let's not pull fusion data? @schultzn @zhx828

@schultzn
Copy link
Contributor Author

schultzn commented Jan 20, 2019 via email

@jjgao
Copy link
Member

jjgao commented Jan 21, 2019

@zhx828 @onursumer

  • do not return FUSIONS in the mutated genes API
  • add an API to return FUSION gene count
  • add Fusion Genes table in study view
    • in this table, not sure if we can have the Freq column since we don't have proper gene panels for fusion.

@jjgao jjgao removed the API label Jan 29, 2019
@zhx828 zhx828 assigned khzhu and unassigned ersinciftci, onursumer and zhx828 Feb 15, 2019
@jjgao
Copy link
Member

jjgao commented Apr 4, 2019

@khzhu do you think you'll have the bandwidth to handle this one?

@khzhu
Copy link
Contributor

khzhu commented Apr 5, 2019

yes, will start to work on it next week.

@n1zea144
Copy link
Contributor

n1zea144 commented May 7, 2019

Hi @khzhu Any progress on this? It was reported by another GENIE user recently.

@khzhu
Copy link
Contributor

khzhu commented May 7, 2019

Hi @n1zea144 , was told to hold till JJ came back. Hongxin and myself had some discussion on how to implement web APIs to retrieve the Fusion data from the database. Fusion currently is stored in the mutation/mation_event table, so it does not has it own business models, web services would still have to use those mutation repositories.
I will ask @jjgao if we could schedule a meeting to discuss the issue.

@zhx828
Copy link
Member

zhx828 commented May 8, 2019

  • Make sure https://www.cbioportal.org/api/swagger-ui.html#/Study_View/fetchMutatedGenesUsingPOST will only return mutations (it's under internal set of endpoints)
  • Add an new endpoint /fusions/fetch under internal study view returning the fusion data
  • The sql query should support the filtering, by default, returning everything(this is important for other pages)
  • Follow the data stream of endpoint /mutated-genes/fetch to find out how the mutation frequency is calculated and do the same thing for fusion. There may be some discussion need to have about how the fusion frequency should be calculated, but you could leave it as TODO until the later phase of implementation

@zhx828
Copy link
Member

zhx828 commented May 8, 2019

@khzhu let me know if you have more questions

@khzhu
Copy link
Contributor

khzhu commented May 8, 2019

thanks, @zhx828! will take a look and get back to you.

@jjgao
Copy link
Member

jjgao commented May 8, 2019

A tricky one is gene panel. We don't have proper gene panel for fusions because the gene panel for mutations are not the same as fusions. And also fusion partners are everywhere.

One option is ignore gene panels. The problem will be that the frequencies will be lower than the actual number.

Moveover, in GENIE, only a subset has fusion data. Maybe we can create a fusion case list and use that as the denominator? @n1zea144

@schultzn thoughts?

@khzhu
Copy link
Contributor

khzhu commented May 8, 2019

thanks, @jjgao ! agree. it might be a good idea to set gene panel aside and see if the new fusion api will resolve the issue Niki reported first.

@n1zea144
Copy link
Contributor

n1zea144 commented May 8, 2019

It would be relatively easy for us to create a fusion case list for IMPACT samples. Seems like a good way to restrict the mutation frequency to a pool of samples where fusion calls were attempted to be made.

@khzhu
Copy link
Contributor

khzhu commented May 9, 2019

Hi @jjgao @zhx828 , here is my plan:

  1. backend:
  • keep existing myBatis query getSampleCountInMultipleMolecularProfiles as it is, since it has a where clause used by many other myBatis queries
  • add a new myBatis query getSampleCountInMultipleMolecularProfilesForMutationOrFusion to pull Mutations Or Fusions out from the mutation table.
  • add a new web service getSampleCountInMultipleMolecularProfilesForFusion
  • add a new API endpoint /fusion-genes/fetch
  1. frontend:
  • add a new Fusion table without Frequency column in the study view

What do you think? Thanks!

@jjgao
Copy link
Member

jjgao commented May 9, 2019

@n1zea144 it would be great if you can priotize that. The case list ID can probably be [study_id]_fusion.

@khzhu if the fusion case list exists, please use it (its overlapping samples with selected samples) as the denominator; otherwise, use all selected samples.

@khzhu
Copy link
Contributor

khzhu commented May 9, 2019

@JJ, will work on fusion case list with @n1zea144 . thanks!

@jjgao
Copy link
Member

jjgao commented May 9, 2019

@khzhu I think you can assume _fusion case list may exist. Once @n1zea144 creates the case list, it'll work.

@khzhu
Copy link
Contributor

khzhu commented May 9, 2019

@jjgao , okay, thanks!

@khzhu
Copy link
Contributor

khzhu commented May 10, 2019

@jjgao @zhx828 , quick updates on this. I got backend part done, now working on the frontend, adding the new fusion table.
Have a quick question: since we filter out fusions from muteded_genes, those fusions will not show up in the oncoprint is that okay?

@zhx828
Copy link
Member

zhx828 commented May 10, 2019 via email

@khzhu
Copy link
Contributor

khzhu commented May 10, 2019

seems that query is used by oncoprint as well.

khzhu pushed a commit to pughlab/cbioportal that referenced this issue May 11, 2019
@khzhu
Copy link
Contributor

khzhu commented May 11, 2019

@jjgao @zhx828 , I've completed all tasks as planned. Now, the Mutated Gene table excludes all fusions (please see attached, the one on the left is the one excluding all fusions. I used breast_msk_2018 dataset for testing), while new Fusion Genes table lists all fusions (please see attached). No other pages such as oncoprint are affected.
I had problems to get the genie dataset imported to my local test database (downloaded data from genie site, but somehow the clinical data was not the up-to-date one). So, if the fusions were the root cause then this fixing should resolve the issue @schultzn reported.

Screen Shot 2019-05-11 at 1 14 11 PM

Screen Shot 2019-05-11 at 12 59 51 PM

khzhu pushed a commit to pughlab/cbioportal that referenced this issue May 14, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue May 14, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Jul 5, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Jul 19, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Jul 22, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Jul 24, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Aug 13, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Aug 19, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Aug 27, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Aug 28, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Sep 5, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Sep 5, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Sep 30, 2019
zhx828 pushed a commit to pughlab/cbioportal that referenced this issue Oct 2, 2019
khzhu pushed a commit to pughlab/cbioportal that referenced this issue Oct 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants