Store LVK alerts in BigQuery #232

hernandezc1 · 2024-06-12T21:14:20Z

This PR creates the necessary files and subdirectories needed to store LIGO/Virgo/KAGRA (LVK) alert information in BigQuery.

Here is a more in-depth explanation of how things work.

setup_broker dir:
The setup_broker/lvk directory contains several bash scripts that are used to deploy an instance of our broker that is specific to LVK. In this PR, I've updated the "main" deployment script (setup_broker.sh) so that it defines and creates (or deletes) the necessary BigQuery and Pub/Sub resources to store LVK alert information in BigQuery.

The idea here is that LVK alerts that our consumer publishes to a Pub/Sub topic (lvk-alerts) will trigger our Cloud Function (lvk-storeBigQuery), and the Cloud Function will insert the alert data into a BigQuery table and publish a message to a new Pub/Sub topic (lvk-BigQuery). The message published to lvk-BigQuery contains the original alert information as well as the alert type and the name of the table to which the alert information was stored to as message attributes.

In setup_broker/lvk you'll also see that I've added a template directory, which contains the schema of the BigQuery table.

cloud_functions dir
The cloud_functions/lvk directory contains four new files: a README, a requirements.txt file, the cloud function's deployment script (deploy.sh) and a Python script (main.py) used to store the alert information in BigQuery and publish a message to the topic lvk-BigQuery. main.py now uses the pittgoogle-client.

broker/setup_broker/templates/bq_lvk_alerts_schema.json

broker/cloud_functions/lvk/store_BigQuery/deploy.sh

tregle · 2024-09-16T18:35:57Z

broker/setup_broker/lvk/setup_broker.sh

+    # create bigquery dataset and table
+    bq --location="${region}" mk --dataset "${bq_dataset}"
+
+    cd templates || exit


should have an exit code (and ideally, helpful message) to debug if an issue occurs.

tregle

Improvements aren't really structural, so I'll all hit the approve bit.

Also, because I'm still new here. :)

tregle · 2024-09-16T18:51:08Z

broker/setup_broker/lvk/setup_broker.sh

+    bq --location="${region}" mk --dataset "${bq_dataset}"
+
+    cd templates || exit
+    bq mk --table "${PROJECT_ID}:${bq_dataset}.${alerts_table}" "bq_${survey}_${alerts_table}_schema.json"


The better way to do this is to use a subshell. Then you're not changing the implicit global state.

cd templates bq mk --table ....

-> (cd templates && bq mk --table ....) || exit 5

Thank you. We should make this replacement in a lot of places (#240).

tregle · 2024-09-16T18:53:03Z

broker/cloud_functions/lvk/store_BigQuery/main.py

It would be good to have unit tests on this, but IIRC it's not a capability yinz have atm.

broker/setup_broker/lvk/templates/bq_lvk_alerts_schema.json

troyraen · 2024-09-18T03:30:45Z

broker/setup_broker/lvk/templates/bq_lvk_alerts_schema.json

@hernandezc1 Add a comment at the top explaining where the information in this file came from and include a url. We would like to be able to point at a machine-readable schema definition (like an avro schema file) provided by the survey. But I think we haven't found one for LVK, is that right?

If you had to pull this together from info on their website, you probably had to make more decisions here than we usually do. @tregle pointed out good things to consider when creating a schema (thanks! It encouraged me to give this file a better look as well). Some of them we probably want to implement, but some of them we probably shouldn't because we aren't the data producers. (And a comment at the top can help make this clear.) For example, we don't want to change a field name, but we could add units to a description if it would make things more clear and we can find/point to them in LVK's docs. But you're the one with the most background here so you'll have to decide specifics.

One other question, How did you decide on the mode and type values here? If we have an avro schema, we can use it to create the bigquery table, and then download this json version for future use. That's the easiest way to make this schema match (as much as possible) the alerts' schema, which is really what it needs to do. (And we should have a bit of code in here that will do that. #239) If you haven't exhausted all the avenues yet, look/ask again to see if you can get your hands on a real schema definition from LVK. Avro would be great, but json or anything else they're using on the backend would be helpful (and a schema version?).

troyraen

I'd like to hear what the solution is for adding a table description with a link to the original schema. Otherwise, LGTM.

add BigQuery table schema for lvk alerts

7f96a03

troyraen reviewed Jun 13, 2024

View reviewed changes

broker/setup_broker/templates/bq_lvk_alerts_schema.json Outdated Show resolved Hide resolved

hernandezc1 and others added 28 commits June 13, 2024 08:42

move lvk alerts schema to lvk directory

3126b65

create BQ dataset and table in setup_broker.sh

d01cf8f

add requirements.txt file for cloud function

e9faf5e

create deployment script for cloud function

70babc7

update setup_broker.sh

5dee599

add README.md for cloud function

faf5d54

update deploy.sh

59e8fac

address codacy issues in setup_broker.sh

9c791f6

rename file containing BigQuery table schema

844b43d

setup_broker.sh now includes schema_version

362929c

reployment script includes versiontag

c3ae7d0

add README.md to schema_map directory

cee6956

update BigQuery table schema

5b13929

setup_broker.sh creates/deletes additional GCP resources

6d77fdb

update main.py for store_BigQuery cf

d2665fb

add lvk schema map

cd0e3bd

rename file for bq table schema

ab7c64c

update parameter names in setup_broker.sh

588cc85

fix spacing in setup_broker.sh

58ef1e8

remove topic syntax from lvk schema map

3988ccd

Merge branch 'develop' into u/ch/store_BigQuery

2fbb18f

added subdirectory for templates

08f0ff7

update schema maps

655c960

add field alert_type to lvk schema map

7d11795

fix typo in setup_broker.sh

a1ca3ca

update ps message attributes in main.py

7feca15

update main.py

8087f25

bug fix

99bd3a1

hernandezc1 added 5 commits September 6, 2024 17:07

update topic in main.py

f02d7c4

update parameter values in deploy.sh

915b970

update README.md

a0f5153

fix typo in README.md

747414c

apply black formatting

d93dc85

hernandezc1 requested review from troyraen and tregle September 8, 2024 21:24

tregle reviewed Sep 9, 2024

View reviewed changes

broker/cloud_functions/lvk/store_BigQuery/deploy.sh Show resolved Hide resolved

troyraen reviewed Sep 9, 2024

View reviewed changes

broker/cloud_functions/lvk/store_BigQuery/deploy.sh Outdated Show resolved Hide resolved

remove schema_version

635c354

hernandezc1 mentioned this pull request Sep 12, 2024

LVK alerts stopped publishing to the lvk-alerts topic in ardent-cycling-243415 #238

Closed

tregle reviewed Sep 16, 2024

View reviewed changes

tregle approved these changes Sep 16, 2024

View reviewed changes

Merge branch 'develop' into u/ch/store_BigQuery

b5f715e

troyraen reviewed Sep 18, 2024

View reviewed changes

This was referenced Sep 18, 2024

Add code to create a BigQuery table using an Avro schema and then download a Json schema #239

Open

Use a subshell when cding around in shell scripts #240

Open

hernandezc1 and others added 6 commits September 30, 2024 11:04

includes requested changes

4f88d93

add schema version

fc7e6f8

includes updates related to schema version and config parameters

4a1c6ed

Merge branch 'develop' into u/ch/store_BigQuery

506b6d4

removed unused parameters in config file

99bd0b7

changes to bq schema and consumer group id name

b0dc024

hernandezc1 requested a review from troyraen October 7, 2024 22:01

troyraen approved these changes Oct 17, 2024

View reviewed changes

hernandezc1 added 3 commits October 17, 2024 15:53

format changes to setup_broker.sh

4402456

address codacy issue

1a8eeef

add description to bq table

4282062

hernandezc1 merged commit c4fdb27 into develop Oct 25, 2024
4 checks passed

hernandezc1 deleted the u/ch/store_BigQuery branch October 25, 2024 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store LVK alerts in BigQuery #232

Store LVK alerts in BigQuery #232

hernandezc1 commented Jun 12, 2024 •

edited

Loading

tregle Sep 16, 2024

tregle left a comment

tregle Sep 16, 2024

troyraen Sep 18, 2024 •

edited

Loading

tregle Sep 16, 2024

troyraen Sep 18, 2024 •

edited

Loading

troyraen left a comment

Store LVK alerts in BigQuery #232

Store LVK alerts in BigQuery #232

Conversation

hernandezc1 commented Jun 12, 2024 • edited Loading

tregle Sep 16, 2024

Choose a reason for hiding this comment

tregle left a comment

Choose a reason for hiding this comment

tregle Sep 16, 2024

Choose a reason for hiding this comment

troyraen Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

tregle Sep 16, 2024

Choose a reason for hiding this comment

troyraen Sep 18, 2024 • edited Loading

Choose a reason for hiding this comment

troyraen left a comment

Choose a reason for hiding this comment

hernandezc1 commented Jun 12, 2024 •

edited

Loading

troyraen Sep 18, 2024 •

edited

Loading

troyraen Sep 18, 2024 •

edited

Loading