Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some changes to the ngs cohort extract files #7113

Merged
merged 5 commits into from
Mar 2, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions scripts/variantstore/wdl/extract/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ RUN pip install -r /app/requirements.txt

# Add the application source code.
ADD raw_array_cohort_extract.py /app
ADD ngs_cohort_extract.py /app

# install google SDK
RUN curl -sSL https://sdk.cloud.google.com | bash
Expand Down
31 changes: 14 additions & 17 deletions scripts/variantstore/wdl/extract/ngs_cohort_extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,27 +229,24 @@ def do_extract(fq_pet_vet_dataset,
min_variant_samples,
fq_sample_mapping_table
):
try:

global client
client = bigquery.Client(project=query_project,
default_query_job_config=QueryJobConfig(labels={ "id" : f"test_cohort_export_{output_table_prefix}"}, priority="INTERACTIVE", use_query_cache=False ))
global client
client = bigquery.Client(project=query_project,
default_query_job_config=QueryJobConfig(labels={ "id" : f"test_cohort_export_{output_table_prefix}"}, priority="INTERACTIVE", use_query_cache=False ))

## TODO -- provide a cmdline arg to override this (so we can simulat smaller datasets)
global PET_VET_TABLE_COUNT
PET_VET_TABLE_COUNT = max_tables
print(f"Using {PET_VET_TABLE_COUNT} PET tables in {fq_pet_vet_dataset}...")
## TODO -- provide a cmdline arg to override this (so we can simulat smaller datasets)
global PET_VET_TABLE_COUNT
PET_VET_TABLE_COUNT = max_tables
print(f"Using {PET_VET_TABLE_COUNT} PET tables in {fq_pet_vet_dataset}...")

cohort = get_all_samples(fq_cohort_sample_names, fq_sample_mapping_table)
print(f"Discovered {len(cohort)} samples in {fq_cohort_sample_names}...")
cohort = get_all_samples(fq_cohort_sample_names, fq_sample_mapping_table)
print(f"Discovered {len(cohort)} samples in {fq_cohort_sample_names}...")

make_new_vet_union_all(fq_pet_vet_dataset, fq_temp_table_dataset, cohort)
make_new_vet_union_all(fq_pet_vet_dataset, fq_temp_table_dataset, cohort)

create_position_table(fq_temp_table_dataset, min_variant_samples)
make_new_pet_union_all(fq_pet_vet_dataset, fq_temp_table_dataset, cohort)
populate_final_extract_table(fq_temp_table_dataset, fq_destination_dataset, destination_table, fq_sample_mapping_table)
except Exception as err:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will just eat the exception, but then continue on to print the "Final cohort extract written to..." error message (which isn't true?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the try/catch is being removed so it will no longer eat the exception. it seems counter intuitive to remove the try/catch but i'm not sure the best way to handle this. maybe add a comment that states exceptions are not being caught so the calling script will error out when an exception occurs. i feel like we could into a cycle otherwise where we try to add exception handling back in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah -- duh, of course. My mistake.

I think the reason the exception handing was in there though was so that we would get the dump_job_stats() output even in the case of failures (so you could see how much $$/bytes were consumed before the failure).

In that case, I'd suggst we have a

try:

finally:
dump_job_stats

without explicitly catching the exception

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that works!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed up that change

print(err)
create_position_table(fq_temp_table_dataset, min_variant_samples)
make_new_pet_union_all(fq_pet_vet_dataset, fq_temp_table_dataset, cohort)
populate_final_extract_table(fq_temp_table_dataset,fq_destination_dataset, destination_table, fq_sample_mapping_table)

dump_job_stats()
print(f"\nFinal cohort extract written to {fq_destination_dataset}.{destination_table}\n")
Expand Down Expand Up @@ -280,4 +277,4 @@ def do_extract(fq_pet_vet_dataset,
args.fq_destination_dataset,
args.destination_table,
args.min_variant_samples,
args.fq_sample_mapping_table)
args.fq_sample_mapping_table)