-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cost observability BQ table [VS-441] #7891
Conversation
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7891 +/- ##
================================================
Coverage ? 86.290%
Complexity ? 35195
================================================
Files ? 2170
Lines ? 164888
Branches ? 17785
================================================
Hits ? 142282
Misses ? 16281
Partials ? 6325 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍🏻
c00ff15
to
ba57f1a
Compare
@@ -19,7 +19,14 @@ workflow GvsAssignIds { | |||
String sample_info_table = "sample_info" | |||
String sample_info_schema_json = '[{"name": "sample_name","type": "STRING","mode": "REQUIRED"},{"name": "sample_id","type": "INTEGER","mode": "NULLABLE"},{"name":"is_loaded","type":"BOOLEAN","mode":"NULLABLE"},{"name":"is_control","type":"BOOLEAN","mode":"REQUIRED"},{"name":"withdrawn","type":"TIMESTAMP","mode":"NULLABLE"}]' | |||
String sample_load_status_json = '[{"name": "sample_id","type": "INTEGER","mode": "REQUIRED"},{"name":"status","type":"STRING","mode":"REQUIRED"}, {"name":"event_timestamp","type":"TIMESTAMP","mode":"REQUIRED"}]' | |||
|
|||
String cost_observability_json = '[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED" }, ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok not for this pr---but I truly do think some of these would benefit from a description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can do that now
datatype = "cost_observability", | ||
schema_json = cost_observability_json, | ||
max_table_id = 1, | ||
superpartitioned = "false", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is superpartitioned!??!?!
and I think this should be partitioned on job_start_timestamp, but I could be persuaded otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe on call set identifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is great too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to only be able to partition on #s (which is one of the reasons we have GvsAssignIds) and it was originally developed for querying logs by date, so I think I'm stuck in the past w my recs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superpartitioning is the thing we do when we create a ref ranges or vet table for every 4000 samples to get around the BQ partitioning limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually the CreateTables
code is not flexible on the partition key so I'm going to make a larger revision here...
asking for a re-review since I ended up changing a lot... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too bad you couldn't re-use the existing task to create the table, but LGTM 👍🏻
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the call logging this event was started | ||
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the observability event was logged | ||
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED" }, ' + # The type of observability event being logged | ||
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED" } ] ' # Number of bytes reported for this observability event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahaha I meant that the "description" could go in the BQ table, but this works perfectly too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I forgot that was even a thing we could do 🤦
fi | ||
>>> | ||
runtime { | ||
docker: "us.gcr.io/broad-dsde-methods/variantstore:ah_var_store_2022_05_16" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using our custom docker?
I'm worried about this getting stale (though I suppose it wont matter for this use case)
I feel like we tend to use: docker: "us.gcr.io/broad-gatk/gatk:4.1.7.0"
String cost_observability_json = | ||
'[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED" }, ' + # The name by which we refer to the callset | ||
' { "name": "step", "type": "STRING", "mode": "REQUIRED" }, ' + # The name of the core GVS workflow to which this belongs | ||
' { "name": "call", "type": "STRING", "mode": "NULLABLE" }, ' + # The WDL call to which this belongs | ||
' { "name": "shard_identifier", "type": "STRING", "mode": "NULLABLE" }, ' + # A unique identifier for this shard, may or may not be its index | ||
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the call logging this event was started | ||
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the observability event was logged | ||
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED" }, ' + # The type of observability event being logged | ||
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED" } ] ' # Number of bytes reported for this observability event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String cost_observability_json = | |
'[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED" }, ' + # The name by which we refer to the callset | |
' { "name": "step", "type": "STRING", "mode": "REQUIRED" }, ' + # The name of the core GVS workflow to which this belongs | |
' { "name": "call", "type": "STRING", "mode": "NULLABLE" }, ' + # The WDL call to which this belongs | |
' { "name": "shard_identifier", "type": "STRING", "mode": "NULLABLE" }, ' + # A unique identifier for this shard, may or may not be its index | |
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the call logging this event was started | |
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED" }, ' + # When the observability event was logged | |
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED" }, ' + # The type of observability event being logged | |
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED" } ] ' # Number of bytes reported for this observability event | |
String cost_observability_json = | |
'[ { "name": "call_set_identifier", "type": "STRING", "mode": "REQUIRED", "description": "The name by which we refer to the callset" }, ' + | |
' { "name": "step", "type": "STRING", "mode": "REQUIRED", "description": "The name of the core GVS workflow to which this belongs" }, ' + | |
' { "name": "call", "type": "STRING", "mode": "NULLABLE", "description": "The WDL call to which this belongs" }, ' + | |
' { "name": "shard_identifier", "type": "STRING", "mode": "NULLABLE" , "description": "A unique identifier for this shard, may or may not be its index" }, ' + | |
' { "name": "call_start_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED", "description": "When the call logging this event was started" }, ' + | |
' { "name": "event_timestamp", "type": "TIMESTAMP", "mode": "REQUIRED", "description": "When the observability event was logged"}, ' + | |
' { "name": "event_key", "type": "STRING", "mode": "REQUIRED", "description": "The type of observability event being logged" }, ' + | |
' { "name": "event_bytes", "type": "INTEGER", "mode": "REQUIRED", "description": "Number of bytes reported for this observability event" } ] ' |
No description provided.