Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DENG-7839 - Backfill baseline_clients_last_seen from 2021-12-01 #7009

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gkatre
Copy link
Contributor

@gkatre gkatre commented Feb 10, 2025

Description

We need to backfill the firefox_desktop_derived.baseline_clients_last_seen_v1 table from 2021-12-01 to current date.
This is a test run for the first month before initiating a longer run.

The requirements for the run is that the backfill needs to run sequentially on a single thread

I believe this is possible because:

Related Tickets & Documents

Reviewer, please follow this checklist

@@ -0,0 +1,9 @@
2025-02-10:
start_date: 2021-12-01
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to flag that legacy_telemetry_client_id and windows_build_number will not be present in the data for this month, so we won't be able to validate those metrics from this test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we can't test on a more recent month because of the dependency issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we can't test on a more recent month because of the dependency issue?

That is correct. I also wanted to get an estimate of what it takes to run for one month especially because the runs need to happen sequentially on a single thread

Copy link
Contributor

@BenWu BenWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed backfills don't seem to support tables with depends_on_past (DENG-3656) which is why the check is failing. I'm not sure why they're not supported though. @wwyc do you know?

I think the depends_on_past code you linked is used when you run bqetl query backfill outside of a managed backfill. In this case, you could use the bqetl_backfill dag but it's more manual and riskier since you need to make sure the params are correct

@dataops-ci-bot
Copy link

dataops-ci-bot commented Feb 10, 2025

Integration report for "formatting"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_last_seen_v1: backfill.yaml
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_last_seen_v1/backfill.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_last_seen_v1/backfill.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_last_seen_v1/backfill.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/firefox_desktop_derived/baseline_clients_last_seen_v1/backfill.yaml	2025-02-10 19:21:45.000000000 +0000
@@ -0,0 +1,9 @@
+2025-02-10:
+  start_date: 2021-12-01
+  end_date: 2021-12-31
+  reason: DAU computations based on baseline ping [DENG-7839]
+  watchers:
+    - gkatre@mozilla.com
+  status: Initiate
+  shredder_mitigation: false
+  override_retention_limit: true

Link to full diff

@wwyc
Copy link
Contributor

wwyc commented Feb 10, 2025

Managed backfills don't seem to support tables with depends_on_past (DENG-3656) which is why the check is failing. I'm not sure why they're not supported though. @wwyc do you know?

Managed backfills currently does not support tables with depends_on_past. Since managed backfills results in a staging table and a different dataset than the original prod table, there needs to be additional logic built in to support tables that depends on past.

@gkatre
Copy link
Contributor Author

gkatre commented Feb 10, 2025

Managed backfills don't seem to support tables with depends_on_past (DENG-3656) which is why the check is failing. I'm not sure why they're not supported though. @wwyc do you know?

Managed backfills currently does not support tables with depends_on_past. Since managed backfills results in a staging table and a different dataset than the original prod table, there needs to be additional logic built in to support tables that depends on past.

@BenWu / @wwyc
Is this additional logic something that can be done fairly quickly for this backfill?

For the firefox_desktop_derived.baseline_clients_last_seen_v1 table we need to backfill for 3+ years.
Do you have any other recommendations? Doing a manual run does not seem ideal. I would appreciate any help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants