-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for incremental addition of data to alt_allele [VS-52] #7993
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## ah_var_store #7993 +/- ##
================================================
Coverage ? 77.030%
Complexity ? 21708
================================================
Files ? 1375
Lines ? 82251
Branches ? 13121
================================================
Hits ? 63358
Misses ? 13764
Partials ? 5129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some draft nitpicks, looks promising!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -39,6 +39,7 @@ def populate_alt_allele_table(call_set_identifier, query_project, vet_table_name | |||
parser.add_argument('--query_project',type=str, help='Google project where query should be executed', required=True) | |||
parser.add_argument('--vet_table_name',type=str, help='vet table name to ingest', required=True) | |||
parser.add_argument('--fq_dataset',type=str, help='project and dataset for data', required=True) | |||
parser.add_argument('--max_sample_id',type=str, help='Maximum value of sample_id already loaded', required=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annoying nit:
parser.add_argument('--max_sample_id',type=str, help='Maximum value of sample_id already loaded', required=True) | |
parser.add_argument('--max_sample_id', type=str, help='Maximum value of sample_id already loaded', required=True) |
@@ -103,7 +157,7 @@ task CreateAltAlleleTable { | |||
|
|||
echo "project_id = ~{project_id}" > ~/.bigqueryrc | |||
bq query --location=US --project_id=~{project_id} --format=csv --use_legacy_sql=false ~{bq_labels} \ | |||
'CREATE OR REPLACE TABLE `~{project_id}.~{dataset_name}.alt_allele` ( | |||
'CREATE TABLE IF NOT EXISTS `~{project_id}.~{dataset_name}.alt_allele` ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, I wasn't aware of this command… nice!
position1 as (select * from `{fq_vet_table}` WHERE call_GT IN ('0/1', '1/0', '1/1', '0|1', '1|0', '1|1', '0/2', '0|2','2/0', '2|0')), | ||
position2 as (select * from `{fq_vet_table}` WHERE call_GT IN ('1/2', '1|2', '2/1', '2|1'))""" | ||
WITH | ||
position1 as (select * from `{fq_vet_table}` WHERE call_GT IN ('0/1', '1/0', '1/1', '0|1', '1|0', '1|1', '0/2', '0|2','2/0', '2|0') AND sample_id > {max_sample_id}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
smart, I like the use of >
rather than some IN
kind of clause
alt_allele
alt_allele
to figure out whichvet_
table to start with and what sample IDs data to grabgvs-internal.rsa_gvs_quickstart_dev.alt_allele
(done with existing version) vs.gvs-internal.rsa_gvs_quickstart_dev.alt_allele_incremental
(done in two batches with code from this branch)Closed https://broadworkbench.atlassian.net/browse/VS-52