Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support symlinks #4881

Merged
merged 3 commits into from
Jan 24, 2024
Merged

Support symlinks #4881

merged 3 commits into from
Jan 24, 2024

Conversation

sean-rose
Copy link
Contributor

@sean-rose sean-rose commented Jan 24, 2024

Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

Comment on lines +64 to +66
backfill_files = map(
Path, glob(f"{search_path}/**/{BACKFILL_FILE}", recursive=True)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.glob() and Path.rglob() will be getting a new follow_symlinks argument in Python 3.13, so once we eventually upgrade to that we should be able to revert workarounds like this and specify follow_symlinks=True instead.

@dataops-ci-bot

This comment has been minimized.

@sean-rose sean-rose enabled auto-merge (squash) January 24, 2024 20:54
@dataops-ci-bot
Copy link

Integration report for "Merge branch 'main' into support-symlinks"

sql.diff

Click to expand!
Only in /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/hubs_derived/active_subscription_ids_live: schema.yaml
Only in /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/relay_derived/active_subscription_ids_live: schema.yaml
Only in /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest: schema.yaml
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/hubs_derived/active_subscription_ids_live/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/hubs_derived/active_subscription_ids_live/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/hubs_derived/active_subscription_ids_live/schema.yaml	2024-01-24 20:57:49.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/hubs_derived/active_subscription_ids_live/schema.yaml	1970-01-01 00:00:00.000000000 +0000
@@ -1,7 +0,0 @@
-fields:
-- name: active_date
-  type: DATE
-  mode: NULLABLE
-- name: subscription_id
-  type: STRING
-  mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_queries_cost_v1/query.py /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_queries_cost_v1/query.py
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_queries_cost_v1/query.py	2024-01-24 20:57:49.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_queries_cost_v1/query.py	2024-01-24 20:45:33.000000000 +0000
@@ -4,6 +4,7 @@
 
 from argparse import ArgumentParser
 from fnmatch import fnmatchcase
+from glob import glob
 from pathlib import Path
 
 from google.cloud import bigquery
@@ -43,9 +44,13 @@
     args = parser.parse_args()
     client = bigquery.Client(args.project)
 
-    sql_queries = list(Path(args.sql_dir).rglob("query.sql"))
-    python_queries = list(Path(args.sql_dir).rglob("query.py"))
-    multipart_queries = list(Path(args.sql_dir).rglob("part1.sql"))
+    sql_queries = list(map(Path, glob(f"{args.sql_dir}/**/query.sql", recursive=True)))
+    python_queries = list(
+        map(Path, glob(f"{args.sql_dir}/**/query.py", recursive=True))
+    )
+    multipart_queries = list(
+        map(Path, glob(f"{args.sql_dir}/**/part1.sql", recursive=True))
+    )
     query_paths = sql_queries + python_queries + multipart_queries
 
     query = create_query(query_paths, args.date)
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_query_usage_v1/query.py /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_query_usage_v1/query.py
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_query_usage_v1/query.py	2024-01-24 20:57:49.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_etl_scheduled_query_usage_v1/query.py	2024-01-24 20:45:33.000000000 +0000
@@ -3,6 +3,7 @@
 """Determine cost of previously scheduled bigquery-etl queries."""
 
 from argparse import ArgumentParser
+from glob import glob
 from pathlib import Path
 
 from google.cloud import bigquery
@@ -57,9 +58,13 @@
 def main():
     args = parser.parse_args()
 
-    sql_queries = list(Path(args.sql_dir).rglob("query.sql"))
-    python_queries = list(Path(args.sql_dir).rglob("query.py"))
-    multipart_queries = list(Path(args.sql_dir).rglob("part1.sql"))
+    sql_queries = list(map(Path, glob(f"{args.sql_dir}/**/query.sql", recursive=True)))
+    python_queries = list(
+        map(Path, glob(f"{args.sql_dir}/**/query.py", recursive=True))
+    )
+    multipart_queries = list(
+        map(Path, glob(f"{args.sql_dir}/**/part1.sql", recursive=True))
+    )
     query_paths = sql_queries + python_queries + multipart_queries
     partition = args.date.replace("-", "")
     destination_table = f"{args.project}.{args.destination_dataset}.{args.destination_table}${partition}"
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/relay_derived/active_subscription_ids_live/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/relay_derived/active_subscription_ids_live/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/relay_derived/active_subscription_ids_live/schema.yaml	2024-01-24 20:57:49.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/relay_derived/active_subscription_ids_live/schema.yaml	1970-01-01 00:00:00.000000000 +0000
@@ -1,7 +0,0 @@
-fields:
-- name: active_date
-  type: DATE
-  mode: NULLABLE
-- name: subscription_id
-  type: STRING
-  mode: NULLABLE
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml	2024-01-24 20:57:49.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/telemetry/releases_latest/schema.yaml	1970-01-01 00:00:00.000000000 +0000
@@ -1,39 +0,0 @@
-fields:
-- name: date
-  type: DATE
-  mode: NULLABLE
-  description: null
-- name: product
-  type: STRING
-  mode: NULLABLE
-  description: null
-- name: category
-  type: STRING
-  mode: NULLABLE
-  description: null
-- name: channel
-  type: STRING
-  mode: NULLABLE
-- name: build_number
-  type: INTEGER
-  mode: NULLABLE
-  description: null
-- name: release_date
-  type: DATE
-  mode: NULLABLE
-- name: version
-  type: STRING
-  mode: NULLABLE
-  description: null
-- name: major_version
-  type: NUMERIC
-  mode: NULLABLE
-- name: minor_version
-  type: NUMERIC
-  mode: NULLABLE
-- name: patch_version
-  type: NUMERIC
-  mode: NULLABLE
-- name: beta_version
-  type: NUMERIC
-  mode: NULLABLE

Link to full diff

@sean-rose sean-rose merged commit a70b2aa into main Jan 24, 2024
@sean-rose sean-rose deleted the support-symlinks branch January 24, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants