-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NA for genomic-data-bin-counts #11006
Add NA for genomic-data-bin-counts #11006
Conversation
src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewFilterMapper.xml
Outdated
Show resolved
Hide resolved
5d2874c
to
d744f1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewFilterMapper.xml
Outdated
Show resolved
Hide resolved
src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewFilterMapper.xml
Outdated
Show resolved
Hide resolved
src/main/resources/org/cbioportal/persistence/mybatisclickhouse/StudyViewFilterMapper.xml
Outdated
Show resolved
Hide resolved
<if test="dataFilterValue.start != null or dataFilterValue.end != null"> | ||
<choose> | ||
<when test="dataFilterValue.start == dataFilterValue.end"> | ||
AND abs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets think about whether we can cast to decimal and replace this.
adc4179
to
dea9647
Compare
dea9647
to
437cd1f
Compare
* Fix intersection with parens
* add genomic data filter tests testing for missing NAs * add NA genomic data filter tests --------- Co-authored-by: Bryan Lai <laib1@mskcc.org>
e114db0
to
3a8536d
Compare
Quality Gate passedIssues Measures |
This is intended to serve as a documentation for the special approach to get NA counts for study view endpoints during the development of ClickHouse RFC80, where the first time implementation for endpoint
genomic-data-counts
is at #10807. These are the endpoints that also take this approach:generic-assay-bin-counts
(#11039),clicinical-data(-bin)-counts
(PR tbd).Some thoughts that might be helpful before we get to the recipe:
NA count = total selected samples count - non-NA count
Now, to the recipe:
Ingredients:
WHERE alteration_value != 'NA'
genomic-data-counts
, the filter for counting and filtering are bothGenomicDataFilter
, which contains{hugo gene symbol, profile type}
genomic-data-bin-counts
, the filter for counting isGenomicDataBinCountFilter
(also contains{hugo gene symbol, profile type}
) and for filtering isGenomicDataFilter
. In this case we only need to passGenomicDataBinCountFilter
from controller to mapper and counting. The study view filter will haveGenomicDataFilter
for filtering.generic-assay-data-bin-counts
, the filter for counting isGenericAssayDataBinCountFilter
(contains{stable id, profile type}
) and for filtering isGenericAssayDataFilter
. We only need to passGenericAssayDataBinCountFilter
from controller to mapper and counting. The study view filter will haveGenericAssayDataFilter
for filtering.Cook:
Create the code path with aboved mentioned filter, controller -> service -> repository -> mapper -> counting SQL
For counting: recall we need
NA count = total selected samples count - non-NA count
. First is to get the "non-NA count":{hugo gene symbol, profile type}
fromGenomicDataFilter
.<bind name="profileType" value="genomicDataBinFilters[0].profileType" />
attributeId
norvalue
, but in this case we still need the count, which is supposed to be 0. We will handle it later.coalesce()
function. Then whenever the non-NA query returns empty results, we still have all properties we need to construct the 'NA' only object with its count = total selected sample count - 0. In the end we UNION the first non-NA query with this NA query together.For filtering: recall we need to find a way to select samples with NA values, and add that to all the samples with non-NA values, since all non-NA values are certainly available in the database. This requires to consider all user selection cases: 1) user select 'NA' only 2) user select non-NA only 3) user select both 'NA' and non-NA. And we can combine results directly in WHERE clause or using UNION, depending on different scenarios:
value IS NULL
to the WHERE clause, whereas for numerical values, since we need to perform binning on these values, it has to be non-NULL, and so we need to combine them with NULL values by UNION.<include refid="selectAllNumericalGeneticAlterations"/>
. Then we can get clean NA values by specifyingWHERE alteration_value IS null
(or using "the way to filter out NA values") with this LEFT JOIN.<when test="dataFilterValue.value == 'NA'">alteration_value IS null</when>
Thank you for reading this far. Hope this is helpful. ;-)