fix the check for duplicates in import genomes #7470

ahaessly · 2021-09-14T19:58:29Z

No description provided.

kcibul

What didn't work in the previous implementation (just to review for that case specifically?)

scripts/variantstore/wdl/GvsImportGenomes.wdl

kcibul · 2021-09-14T20:20:42Z

scripts/variantstore/wdl/GvsImportGenomes.wdl

+    NAMES_FILE=~{write_lines(sample_names)}
+    bq load --project_id=~{project_id} ${TEMP_TABLE} $NAMES_FILE "sample_name:STRING"
+
+    bq --location=US --project_id=~{project_id} query --format=csv -n ~{num_samples} --use_legacy_sql=false \


Maybe a comment here would be good saying how this works? My take (to see if I got it right) is "Check to see if data has been loaded for any of the provided sample names"?

gatk-bot · 2021-09-14T20:38:09Z

Travis reported job failures from build 36030
Failures in the following jobs:

Test Type	JDK	Job ID	Logs
unit	openjdk11	36030.13	logs
unit	openjdk8	36030.3	logs

ahaessly · 2021-09-16T15:52:44Z

The main issue with this task was that the query results were being limited to 100 by default. So we use the -n param now in the query. Another issue was that we were running bq show on a table variable $TABLE which is never defined.
I also changed this because the approach (returning all the samples names of the samples that have been loaded) didn't seem scalable. I wanted to only return at most the number of samples we are trying to ingest.

ahaessly added 4 commits September 14, 2021 15:54

new query to detect duplicates

d8311c3

update yml

7d832ee

missing param in sql

e786c57

allow more than 100 samples to be returned

0328ef1

kcibul reviewed Sep 14, 2021

View reviewed changes

document and use vet instead of pet for query

415feaa

kcibul approved these changes Sep 17, 2021

View reviewed changes

remove branch from yml

0232b5e

ahaessly merged commit 185b5f4 into ah_var_store Sep 17, 2021

ahaessly deleted the ah_fix_dupes branch September 17, 2021 20:22

This was referenced Mar 17, 2023

lb merge gvs branch #8248

Closed

testing something, please ignore #8251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the check for duplicates in import genomes #7470

fix the check for duplicates in import genomes #7470

ahaessly commented Sep 14, 2021

kcibul left a comment

kcibul Sep 14, 2021

gatk-bot commented Sep 14, 2021 •

edited

Loading

ahaessly commented Sep 16, 2021

fix the check for duplicates in import genomes #7470

fix the check for duplicates in import genomes #7470

Conversation

ahaessly commented Sep 14, 2021

kcibul left a comment

Choose a reason for hiding this comment

kcibul Sep 14, 2021

Choose a reason for hiding this comment

gatk-bot commented Sep 14, 2021 • edited Loading

ahaessly commented Sep 16, 2021

gatk-bot commented Sep 14, 2021 •

edited

Loading