-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store LVK alerts in BigQuery #232
Conversation
# create bigquery dataset and table | ||
bq --location="${region}" mk --dataset "${bq_dataset}" | ||
|
||
cd templates || exit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should have an exit code (and ideally, helpful message) to debug if an issue occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improvements aren't really structural, so I'll all hit the approve bit.
Also, because I'm still new here. :)
bq --location="${region}" mk --dataset "${bq_dataset}" | ||
|
||
cd templates || exit | ||
bq mk --table "${PROJECT_ID}:${bq_dataset}.${alerts_table}" "bq_${survey}_${alerts_table}_schema.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The better way to do this is to use a subshell. Then you're not changing the implicit global state.
cd templates
bq mk --table ....
-> (cd templates && bq mk --table ....) || exit 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. We should make this replacement in a lot of places (#240).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have unit tests on this, but IIRC it's not a capability yinz have atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hernandezc1 Add a comment at the top explaining where the information in this file came from and include a url. We would like to be able to point at a machine-readable schema definition (like an avro schema file) provided by the survey. But I think we haven't found one for LVK, is that right?
If you had to pull this together from info on their website, you probably had to make more decisions here than we usually do. @tregle pointed out good things to consider when creating a schema (thanks! It encouraged me to give this file a better look as well). Some of them we probably want to implement, but some of them we probably shouldn't because we aren't the data producers. (And a comment at the top can help make this clear.) For example, we don't want to change a field name, but we could add units to a description if it would make things more clear and we can find/point to them in LVK's docs. But you're the one with the most background here so you'll have to decide specifics.
One other question, How did you decide on the mode and type values here? If we have an avro schema, we can use it to create the bigquery table, and then download this json version for future use. That's the easiest way to make this schema match (as much as possible) the alerts' schema, which is really what it needs to do. (And we should have a bit of code in here that will do that. #239) If you haven't exhausted all the avenues yet, look/ask again to see if you can get your hands on a real schema definition from LVK. Avro would be great, but json or anything else they're using on the backend would be helpful (and a schema version?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to hear what the solution is for adding a table description with a link to the original schema. Otherwise, LGTM.
This PR creates the necessary files and subdirectories needed to store LIGO/Virgo/KAGRA (LVK) alert information in BigQuery.
Here is a more in-depth explanation of how things work.
setup_broker dir:
The
setup_broker/lvk
directory contains several bash scripts that are used to deploy an instance of our broker that is specific to LVK. In this PR, I've updated the "main" deployment script (setup_broker.sh
) so that it defines and creates (or deletes) the necessary BigQuery and Pub/Sub resources to store LVK alert information in BigQuery.The idea here is that LVK alerts that our consumer publishes to a Pub/Sub topic (
lvk-alerts
) will trigger our Cloud Function (lvk-storeBigQuery
), and the Cloud Function will insert the alert data into a BigQuery table and publish a message to a new Pub/Sub topic (lvk-BigQuery
). The message published tolvk-BigQuery
contains the original alert information as well as the alert type and the name of the table to which the alert information was stored to as message attributes.In
setup_broker/lvk
you'll also see that I've added atemplate
directory, which contains the schema of the BigQuery table.cloud_functions dir
The
cloud_functions/lvk
directory contains four new files: aREADME
, arequirements.txt
file, the cloud function's deployment script (deploy.sh
) and a Python script (main.py
) used to store the alert information in BigQuery and publish a message to the topiclvk-BigQuery
.main.py
now uses thepittgoogle-client
.