Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cost observability BQ table [VS-441] #7891

Merged
merged 6 commits into from
Jun 15, 2022

Conversation

mcovarr
Copy link
Collaborator

@mcovarr mcovarr commented Jun 13, 2022

No description provided.

@mcovarr mcovarr requested review from rsasch and RoriCremer June 13, 2022 15:03
@codecov
Copy link

codecov bot commented Jun 13, 2022

Codecov Report

❗ No coverage uploaded for pull request base (ah_var_store@727c1da). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff                @@
##             ah_var_store     #7891   +/-   ##
================================================
  Coverage                ?   86.290%           
  Complexity              ?     35195           
================================================
  Files                   ?      2170           
  Lines                   ?    164888           
  Branches                ?     17785           
================================================
  Hits                    ?    142282           
  Misses                  ?     16281           
  Partials                ?      6325           

Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻

@mcovarr mcovarr force-pushed the vs_441_cost_observability branch from c00ff15 to ba57f1a Compare June 14, 2022 10:47
@@ -19,7 +19,14 @@ workflow GvsAssignIds {
String sample_info_table = "sample_info"
String sample_info_schema_json = '[{"name": "sample_name","type": "STRING","mode": "REQUIRED"},{"name": "sample_id","type": "INTEGER","mode": "NULLABLE"},{"name":"is_loaded","type":"BOOLEAN","mode":"NULLABLE"},{"name":"is_control","type":"BOOLEAN","mode":"REQUIRED"},{"name":"withdrawn","type":"TIMESTAMP","mode":"NULLABLE"}]'
String sample_load_status_json = '[{"name": "sample_id","type": "INTEGER","mode": "REQUIRED"},{"name":"status","type":"STRING","mode":"REQUIRED"}, {"name":"event_timestamp","type":"TIMESTAMP","mode":"REQUIRED"}]'

String cost_observability_json = '[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED" }, ' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok not for this pr---but I truly do think some of these would benefit from a description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I can do that now

datatype = "cost_observability",
schema_json = cost_observability_json,
max_table_id = 1,
superpartitioned = "false",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is superpartitioned!??!?!

and I think this should be partitioned on job_start_timestamp, but I could be persuaded otherwise

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe on call set identifier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is great too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to only be able to partition on #s (which is one of the reasons we have GvsAssignIds) and it was originally developed for querying logs by date, so I think I'm stuck in the past w my recs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superpartitioning is the thing we do when we create a ref ranges or vet table for every 4000 samples to get around the BQ partitioning limits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the CreateTables code is not flexible on the partition key so I'm going to make a larger revision here...

@mcovarr
Copy link
Collaborator Author

mcovarr commented Jun 14, 2022

asking for a re-review since I ended up changing a lot...

@mcovarr mcovarr requested review from RoriCremer and rsasch June 14, 2022 21:11
Copy link

@rsasch rsasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too bad you couldn't re-use the existing task to create the table, but LGTM 👍🏻

' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the call logging this event was started
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the observability event was logged
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED" }, ' + # The type of observability event being logged
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED" } ] ' # Number of bytes reported for this observability event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hahaha I meant that the "description" could go in the BQ table, but this works perfectly too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I forgot that was even a thing we could do 🤦

@mcovarr mcovarr merged commit 89a7d5d into ah_var_store Jun 15, 2022
@mcovarr mcovarr deleted the vs_441_cost_observability branch June 15, 2022 19:33
fi
>>>
runtime {
docker: "us.gcr.io/broad-dsde-methods/variantstore:ah_var_store_2022_05_16"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using our custom docker?

I'm worried about this getting stale (though I suppose it wont matter for this use case)

I feel like we tend to use: docker: "us.gcr.io/broad-gatk/gatk:4.1.7.0"

Comment on lines +182 to +190
String cost_observability_json =
'[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED" }, ' + # The name by which we refer to the callset
' { "name": "step", "type": "STRING", "mode": "REQUIRED" }, ' + # The name of the core GVS workflow to which this belongs
' { "name": "call", "type": "STRING", "mode": "NULLABLE" }, ' + # The WDL call to which this belongs
' { "name": "shard_identifier", "type": "STRING", "mode": "NULLABLE" }, ' + # A unique identifier for this shard, may or may not be its index
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the call logging this event was started
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the observability event was logged
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED" }, ' + # The type of observability event being logged
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED" } ] ' # Number of bytes reported for this observability event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String cost_observability_json =
'[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED" }, ' + # The name by which we refer to the callset
' { "name": "step", "type": "STRING", "mode": "REQUIRED" }, ' + # The name of the core GVS workflow to which this belongs
' { "name": "call", "type": "STRING", "mode": "NULLABLE" }, ' + # The WDL call to which this belongs
' { "name": "shard_identifier", "type": "STRING", "mode": "NULLABLE" }, ' + # A unique identifier for this shard, may or may not be its index
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the call logging this event was started
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the observability event was logged
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED" }, ' + # The type of observability event being logged
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED" } ] ' # Number of bytes reported for this observability event
String cost_observability_json =
'[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED", "description": "The name by which we refer to the callset" }, ' +
' { "name": "step", "type": "STRING", "mode": "REQUIRED", "description": "The name of the core GVS workflow to which this belongs" }, ' +
' { "name": "call", "type": "STRING", "mode": "NULLABLE", "description": "The WDL call to which this belongs" }, ' +
' { "name": "shard_identifier", "type": "STRING", "mode": "NULLABLE" , "description": "A unique identifier for this shard, may or may not be its index" }, ' +
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED", "description": "When the call logging this event was started" }, ' +
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED", "description": "When the observability event was logged"}, ' +
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED", "description": "The type of observability event being logged" }, ' +
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED", "description": "Number of bytes reported for this observability event" } ] '

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants