Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store LVK alerts in BigQuery #232

Merged
merged 70 commits into from
Oct 25, 2024
Merged

Store LVK alerts in BigQuery #232

merged 70 commits into from
Oct 25, 2024

Conversation

hernandezc1
Copy link
Collaborator

@hernandezc1 hernandezc1 commented Jun 12, 2024

This PR creates the necessary files and subdirectories needed to store LIGO/Virgo/KAGRA (LVK) alert information in BigQuery.

Here is a more in-depth explanation of how things work.

setup_broker dir:
The setup_broker/lvk directory contains several bash scripts that are used to deploy an instance of our broker that is specific to LVK. In this PR, I've updated the "main" deployment script (setup_broker.sh) so that it defines and creates (or deletes) the necessary BigQuery and Pub/Sub resources to store LVK alert information in BigQuery.

The idea here is that LVK alerts that our consumer publishes to a Pub/Sub topic (lvk-alerts) will trigger our Cloud Function (lvk-storeBigQuery), and the Cloud Function will insert the alert data into a BigQuery table and publish a message to a new Pub/Sub topic (lvk-BigQuery). The message published to lvk-BigQuery contains the original alert information as well as the alert type and the name of the table to which the alert information was stored to as message attributes.

In setup_broker/lvk you'll also see that I've added a template directory, which contains the schema of the BigQuery table.

cloud_functions dir
The cloud_functions/lvk directory contains four new files: a README, a requirements.txt file, the cloud function's deployment script (deploy.sh) and a Python script (main.py) used to store the alert information in BigQuery and publish a message to the topic lvk-BigQuery. main.py now uses the pittgoogle-client.

# create bigquery dataset and table
bq --location="${region}" mk --dataset "${bq_dataset}"

cd templates || exit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have an exit code (and ideally, helpful message) to debug if an issue occurs.

Copy link
Collaborator

@tregle tregle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improvements aren't really structural, so I'll all hit the approve bit.

Also, because I'm still new here. :)

bq --location="${region}" mk --dataset "${bq_dataset}"

cd templates || exit
bq mk --table "${PROJECT_ID}:${bq_dataset}.${alerts_table}" "bq_${survey}_${alerts_table}_schema.json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The better way to do this is to use a subshell. Then you're not changing the implicit global state.

cd templates
bq mk --table ....

-> (cd templates && bq mk --table ....) || exit 5

Copy link
Collaborator

@troyraen troyraen Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. We should make this replacement in a lot of places (#240).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have unit tests on this, but IIRC it's not a capability yinz have atm.

Copy link
Collaborator

@troyraen troyraen Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hernandezc1 Add a comment at the top explaining where the information in this file came from and include a url. We would like to be able to point at a machine-readable schema definition (like an avro schema file) provided by the survey. But I think we haven't found one for LVK, is that right?

If you had to pull this together from info on their website, you probably had to make more decisions here than we usually do. @tregle pointed out good things to consider when creating a schema (thanks! It encouraged me to give this file a better look as well). Some of them we probably want to implement, but some of them we probably shouldn't because we aren't the data producers. (And a comment at the top can help make this clear.) For example, we don't want to change a field name, but we could add units to a description if it would make things more clear and we can find/point to them in LVK's docs. But you're the one with the most background here so you'll have to decide specifics.

One other question, How did you decide on the mode and type values here? If we have an avro schema, we can use it to create the bigquery table, and then download this json version for future use. That's the easiest way to make this schema match (as much as possible) the alerts' schema, which is really what it needs to do. (And we should have a bit of code in here that will do that. #239) If you haven't exhausted all the avenues yet, look/ask again to see if you can get your hands on a real schema definition from LVK. Avro would be great, but json or anything else they're using on the backend would be helpful (and a schema version?).

@hernandezc1 hernandezc1 requested a review from troyraen October 7, 2024 22:01
Copy link
Collaborator

@troyraen troyraen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hear what the solution is for adding a table description with a link to the original schema. Otherwise, LGTM.

@hernandezc1 hernandezc1 merged commit c4fdb27 into develop Oct 25, 2024
4 checks passed
@hernandezc1 hernandezc1 deleted the u/ch/store_BigQuery branch October 25, 2024 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants