You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, every worker is independently making these API calls, triggering rate limiting when we scale up the number of machines for backfills. I think it should be possible to express this table listing as a slowly updating global window side input which would make it run on a single machine.
Currently, we look up the tables in a dataset only when we see a record with destination table in that dataset. For the side input case, we'd need to list all datasets, and then list all tables within each dataset, so we'd need to provide information about which project to list datasets from.
The text was updated successfully, but these errors were encountered:
One known failure mode for ingestion-beam is rate limiting from the BQ API when we list datasets/tables in order to check whether destination tables exist. See https://mozilla-hub.atlassian.net/browse/DSRE-194 and mozilla/bigquery-backfill#15
Currently, every worker is independently making these API calls, triggering rate limiting when we scale up the number of machines for backfills. I think it should be possible to express this table listing as a slowly updating global window side input which would make it run on a single machine.
Currently, we look up the tables in a dataset only when we see a record with destination table in that dataset. For the side input case, we'd need to list all datasets, and then list all tables within each dataset, so we'd need to provide information about which project to list datasets from.
The text was updated successfully, but these errors were encountered: