Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVCSE-1595 Setup import of tables from staging FxA databases #4578

Merged
merged 9 commits into from
Nov 23, 2023
1 change: 1 addition & 0 deletions bqetl_project.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ dry_run:
- sql/moz-fx-data-shared-prod/accounts_backend_external/accounts_v1/query.sql
- sql/moz-fx-data-shared-prod/accounts_backend_external/emails_v1/query.sql
- sql/moz-fx-data-shared-prod/accounts_backend/accounts/view.sql
- sql/moz-fx-data-shared-prod/accounts_db_nonprod_external/**/*.sql
- sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_content_events_v1/query.sql
- sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_auth_bounce_events_v1/query.sql
- sql/moz-fx-data-shared-prod/firefox_accounts_derived/fxa_auth_events_v1/query.sql
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
friendly_name: Firefox Accounts Databases nonprod (stage) External
description: |-
Data extracted from the nonprod (stage) FxA backend services databases.
See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa for more information.

Access to this dataset is restricted to accounts-confidential workgroup because some tables here contain sensitive data.
dataset_base_acl: restricted
user_facing: false
labels: {}
workgroup_access:
- role: roles/bigquery.dataViewer
members:
- workgroup:accounts-confidential
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: accountCustomers table from nonprod (stage) fxa database
description: >
A mirror of the `accountCustomers` table from the nonprod (stage) `fxa` CloudSQL database,
updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
SELECT
TO_HEX(uid) AS uid,
stripeCustomerId,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(createdAt AS INT)) AS createdAt,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(updatedAt AS INT)) AS updatedAt,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
uid,
stripeCustomerId,
createdAt,
updatedAt
FROM
fxa.accountCustomers
"""
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
fields:
- name: uid
type: STRING
mode: NULLABLE
description: Account ID in hexadecimal format.
- name: stripeCustomerId
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should go to the hassle of making the BigQuery columns snake_case. We're essentially already doing so for the table names, and if any of these tables will be made available in Looker that would impact how the column names would be displayed there (though I suppose we could also implement LookML generator logic to create snake_case dimensions for camelCase columns).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting this data would be exposed in Looker via aggregates and raw tables would rather be used for exploratory analyses. One argument for keeping columns unchanged I can think of is that it makes it a bit easier to find them in FxA codebase. I'm open to discussion though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine punting on this for now. No need to go to all that extra work if there isn't a concrete use case/benefit.

type: STRING
mode: NULLABLE
- name: createdAt
type: TIMESTAMP
mode: NULLABLE
- name: updatedAt
type: TIMESTAMP
mode: NULLABLE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: accountResetTokens table from nonprod (stage) fxa database
description: >
A mirror of the `accountResetTokens` table from the nonprod (stage) `fxa` CloudSQL database,
excluding columns containing confidential data, updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SELECT
TO_HEX(uid) AS uid,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(createdAt AS INT)) AS createdAt,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
uid,
createdAt
FROM
fxa.accountResetTokens
akkomar marked this conversation as resolved.
Show resolved Hide resolved
"""
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
fields:
- name: uid
type: STRING
mode: NULLABLE
description: Account ID in hexadecimal format.
- name: createdAt
type: TIMESTAMP
mode: NULLABLE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: accounts table from nonprod (stage) fxa database
description: >
A mirror of the `accounts` table from the nonprod (stage) `fxa` CloudSQL database,
excluding columns containing confidential data, updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
SELECT
TO_HEX(uid) AS uid,
normalizedEmail,
email,
SAFE_CAST(emailVerified AS BOOL) AS emailVerified,
verifierVersion,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(verifierSetAt AS INT)) AS verifierSetAt,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(createdAt AS INT)) AS createdAt,
locale,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(lockedAt AS INT)) AS lockedAt,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(profileChangedAt AS INT)) AS profileChangedAt,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(keysChangedAt AS INT)) AS keysChangedAt,
ecosystemAnonId,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(disabledAt AS INT)) AS disabledAt,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(metricsOptOutAt AS INT)) AS metricsOptOutAt,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
uid,
normalizedEmail,
email,
emailVerified,
verifierVersion,
verifierSetAt,
createdAt,
locale,
lockedAt,
profileChangedAt,
keysChangedAt,
ecosystemAnonId,
disabledAt,
metricsOptOutAt
FROM
fxa.accounts
"""
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
fields:
- name: uid
type: STRING
mode: NULLABLE
description: Account ID in hexadecimal format.
- name: normalizedEmail
type: STRING
mode: NULLABLE
- name: email
type: STRING
mode: NULLABLE
- name: emailVerified
type: BOOLEAN
mode: NULLABLE
- name: verifierVersion
type: INTEGER
mode: NULLABLE
- name: verifierSetAt
type: TIMESTAMP
mode: NULLABLE
- name: createdAt
type: TIMESTAMP
mode: NULLABLE
- name: locale
type: STRING
mode: NULLABLE
- name: lockedAt
type: TIMESTAMP
mode: NULLABLE
- name: profileChangedAt
type: TIMESTAMP
mode: NULLABLE
- name: keysChangedAt
type: TIMESTAMP
mode: NULLABLE
- name: ecosystemAnonId
type: STRING
mode: NULLABLE
- name: disabledAt
type: TIMESTAMP
mode: NULLABLE
- name: metricsOptOutAt
type: TIMESTAMP
mode: NULLABLE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: dbMetadata table from nonprod (stage) fxa database
description: >
A mirror of the `dbMetadata` table from the nonprod (stage) `fxa` CloudSQL database,
updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SELECT
name,
value,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
name,
value
FROM
fxa.dbMetadata
"""
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fields:
- name: name
type: STRING
mode: NULLABLE
- name: value
type: STRING
mode: NULLABLE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: deviceCommandIdentifiers table from nonprod (stage) fxa database
description: >
A mirror of the `deviceCommandIdentifiers` table from the nonprod (stage) `fxa` CloudSQL database,
updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
SELECT
commandId,
commandName,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
commandId,
commandName
FROM
fxa.deviceCommandIdentifiers
"""
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
fields:
- name: commandId
type: INTEGER
mode: NULLABLE
- name: commandName
type: STRING
mode: NULLABLE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: deviceCommands table from nonprod (stage) fxa database
description: >
A mirror of the `deviceCommands` table from the nonprod (stage) `fxa` CloudSQL database,
excluding columns containing confidential data, updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
SELECT
TO_HEX(uid) AS uid,
deviceId,
commandId,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
uid,
deviceId,
commandId
FROM
fxa.deviceCommands
"""
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
fields:
- name: uid
type: STRING
mode: NULLABLE
description: Account ID in hexadecimal format.
- name: deviceId
type: STRING
mode: NULLABLE
- name: commandId
type: INTEGER
mode: NULLABLE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
friendly_name: devices table from nonprod (stage) fxa database
description: >
A mirror of the `devices` table from the nonprod (stage) `fxa` CloudSQL database,
excluding columns containing confidential data, updated daily to match the current state of the table.

See https://mozilla.github.io/ecosystem-platform/reference/database-structure#database-fxa
owners:
- akomar@mozilla.com
labels:
application: accounts_backend
schedule: daily
scheduling:
dag_name: bqetl_accounts_backend_external
# destination is the whole table, not a single partition,
# so don't use date_partition_parameter
date_partition_parameter: null
referenced_tables: []
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
SELECT
TO_HEX(uid) AS uid,
id,
name,
nameUtf8,
type,
SAFE.TIMESTAMP_MILLIS(SAFE_CAST(createdAt AS INT)) AS createdAt,
callbackPublicKey,
SAFE_CAST(callbackIsExpired AS BOOL) AS callbackIsExpired,
FROM
EXTERNAL_QUERY(
"moz-fx-fxa-nonprod.us.fxa-rds-nonprod-stage-fxa",
"""SELECT
uid,
id,
name,
nameUtf8,
type,
createdAt,
callbackPublicKey,
callbackIsExpired
FROM
fxa.devices
"""
)
Loading