-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DENG-1705 - Add missing client attribution columns to clients daily/first-seen #4505
Conversation
Integration report for "Revert changes to clients_first_seen_v2 schema"
|
Integration report for "DENG-1705 Add missing client attribution columns to clients daily/firstseen"
|
Integration report for "Typo in clients_first_seen"
|
Integration report for "Change ping priority"
|
sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/query.sql
Show resolved
Hide resolved
Integration report for "Update test schema"
|
@SuYoungHong would it all make sense to add any or all the following columns in addition (at least to clients_daily, it might not make sense to add to clients_first_seen since the other pings dont have these fields)
Thinking about whether these would help at all to flag any potential client_id regens or forks. For example, if the first field is >> 1 on first_seen_date, it might be convenient to have in clients_daily. Wasn't sure though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for integrating of missing fields. Shall we also update in this PR the schema for telemetry_derived.clients_last_seen_joined_v2
?
- name: distributor_channel | ||
type: STRING | ||
mode: NULLABLE | ||
- name: env_build_platform_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice a different naming convention e.g. environment.partner.partner_id
is named partner_id
, while environment.build.xpcom_abi
is named env_build_xpcom_abi
which might cause confusion to users and also differs from the naming in v2. It makes sense to me to keep the naming without the suffix for consistency e.g. xpcom_abi
instead of env_build_xpcom_abi
, wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I matched the convention for the existing environment.build
columns in clients_daily
which all prefix with env_build_
see https://github.com/mozilla/bigquery-etl/pull/4505/files#diff-0802d82f91d4f1ab2d91e8d0d1ca4062467a1b723cee0b293eab62248966b949R242-R245. Since we're not likely to change those upstream columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, some prefix and some don't e.g. partner_id, is_wow64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are from environment.partner
and environment.system
not environment.build
so doesn't seem like there was a previous convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this leads to an unnecessary complication of needing to update downstream queries when cascading the changes due to different naming in v1 and v2. It'd be so much better if we align the naming between them in this PR rather than updating the schema or expanding the queries later, see e.g. platfform_version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. The schemas for clients_first_seen_v2
and clients_first_seen_v1
are already very different. Keeping the convention makes sense for the same reason we wouldn't change the upstream env_build_arch
, default_search_engine_data_load_path
, or geo_subdivision1
to match the downstream column names.
sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/query.sql
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/query.sql
Show resolved
Hide resolved
- name: env_build_platform_version | ||
type: STRING | ||
mode: NULLABLE | ||
- name: env_build_xpcom_abi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same case of env_build_platform_version about naming convention.
sql/moz-fx-data-shared-prod/telemetry_derived/clients_first_seen_v2/query.sql
Outdated
Show resolved
Hide resolved
- name: distributor_channel | ||
type: STRING | ||
mode: NULLABLE | ||
- name: env_build_platform_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about naming, which also differs from v2. It might be confusing for users.
- name: env_build_platform_version | ||
type: STRING | ||
mode: NULLABLE | ||
- name: env_build_xpcom_abi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about naming as above.
@@ -858,6 +879,7 @@ aggregates AS ( | |||
submission_timestamp | |||
) | |||
).*, | |||
mozfun.stats.mode_last(ARRAY_AGG(geo_db_version ORDER BY submission_timestamp)) AS geo_db_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for this field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say, this is used downstream in clients_first_seen_v2
cc @lucia-vargas-a
sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/schema.yaml
Show resolved
Hide resolved
hey, one question I had about this: I noticed that in bigquery-etl/sql/moz-fx-data-shared-prod/telemetry_derived/clients_daily_v6/query.sql , when we aggregate the But do these respect nulls (if the first ping had null for these values, do we get null back, or does it return first non-null)? Code for reference:
and
|
@irrationalagent , yes, I think those fields would absolutely be helpful
|
|
@irrationalagent @SuYoungHong there's currently this field: |
I think it would be true that it would be a potential regen, but I think it could also be caused by the ping ordering issue (where a main-ping-first client's first main ping is from a subsequent, usually second, subsession). If we had the precise number then we could be more confident its a regen specifically. |
c51b3ab
to
f2b8ad2
Compare
Integration report for "Added min/max subsession counter"
|
Integration report for "Revert priority change"
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, LGTM
b92fd83
to
244fa32
Compare
Integration report for "DENG-1705 Add missing client attribution columns to clients daily/firstseen"
|
c87b7fe
to
e424604
Compare
Integration report for "Update clients_last_seen_joined"
|
* android funnel test * fix filter expression * fix string comparison * revise toml * add completed event * simplify by using events_unnested * Funnel fixes * Bump mkdocs from 1.5.2 to 1.5.3 (#4321) Bumps [mkdocs](https://github.com/mkdocs/mkdocs) from 1.5.2 to 1.5.3. - [Release notes](https://github.com/mkdocs/mkdocs/releases) - [Commits](mkdocs/mkdocs@1.5.2...1.5.3) --- updated-dependencies: - dependency-name: mkdocs dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [RS-826] New job to calculate newtab visits -> activity stream sessions (#4387) * New job to calculate newtab visits -> activity stream sessions * Removing newline chars at end of file * Removing newline chars at end of file * Removing newline chars at end of file * Addressing comment suggestions * Format * Add bqetl_ads DAG * Add ACL to nt_visits_to_sessions_conversion_factors_daily_v1 * Add metadata files * Add view to dry_run skip list * Oops, fix the view --------- Co-authored-by: Curtis Morales <cmorales@mozilla.com> * Allow running multiple checks (#4471) * Allow running multiple checks * Don't yield anything on no matches * Change pocket_available for new Pocket markets (#4472) * FXA-6721 Setup import of accounts table from FxA production CloudSQL (#4423) * Urlbar events: nested (long) instead of wide (#4373) * feat: urlbar events final release * feat: new result types * feat: add interaction and group * fix: date * fix: use BQ builtin for UUIDs * Add the view_v2' * Add new table to the DAG * fix CI error fix ci error * remove teon brooks * Incorporate feedback by Curtis Incorporate feedback from Curtis --------- Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com> Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com> * DENG-1705 - Add startup_profile_selection_reason_first to clients_daily_v6 (#4473) * Update experiment export query to include feature ids and branch feature config values (#4477) * Update experiment export query to include feature ids and branch feature config value. * Add view skip for broken view * add skip to dry run as well * DENG-476 - Update monitoring ETLs to reference main_v5 (#4431) * DENG-476 - Update sampled main ping tables to reference main_v5 (#4433) * DENG-476 - Update experiment aggregates ETL to reference main_v5 (#4435) * DENG-476 - Update internet outages to reference main_v5 (#4432) * Fix test for mozfun.norm.result_type_to_product_name (#4487) * Bug 1860814 - fix amo_prod__desktop_addons_by_client (#4481) * quick fix * fix spread out groupby * move out sourcetable query --------- Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * fix for #4481 (#4489) * DENG-1781- Remove urlbar_events_temp_v2 view and repoint urlbar_events view to v2 (#4486) * Remove urlbar_events_temp_v2 view and repoint urlbar_events view to v2 * Include all sql_gen files in package (#4490) When the bigquery-etl package is installed from pypi (or locally via `pip install .`), the only non-py files included in the package are those in the `package_data` section of setup.py. Previously, with just those files, sql generation would fail due to missing files. Because this directory is small, we should include all files so no one accidentally runs into this problem again. Co-authored-by: Daniel Thorn <dthorn@mozilla.com> * Bump types-requests from 2.31.0.2 to 2.31.0.10 (#4475) Bumps [types-requests](https://github.com/python/typeshed) from 2.31.0.2 to 2.31.0.10. - [Commits](https://github.com/python/typeshed/commits) --- updated-dependencies: - dependency-name: types-requests dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump mozilla-metric-config-parser from 2023.9.2 to 2023.10.2 (#4476) Bumps [mozilla-metric-config-parser](https://github.com/mozilla/metric-config-parser) from 2023.9.2 to 2023.10.2. - [Release notes](https://github.com/mozilla/metric-config-parser/releases) - [Commits](mozilla/metric-config-parser@2023.9.2...2023.10.2) --- updated-dependencies: - dependency-name: mozilla-metric-config-parser dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Glean server knobs monitoring table (#4491) * Glean server knobs monitoring table * fix code gen and skip dry-run * Remove view creation in query * DENG-1879 Setup import of emails table from FxA stage CloudSQL (#4493) * DENG-1879 Setup import of emails table from FxA prod CloudSQL (#4494) * Bump jsonschema from 4.19.0 to 4.19.2 (#4495) Bumps [jsonschema](https://github.com/python-jsonschema/jsonschema) from 4.19.0 to 4.19.2. - [Release notes](https://github.com/python-jsonschema/jsonschema/releases) - [Changelog](https://github.com/python-jsonschema/jsonschema/blob/main/CHANGELOG.rst) - [Commits](python-jsonschema/jsonschema@v4.19.0...v4.19.2) --- updated-dependencies: - dependency-name: jsonschema dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: akkomar <akkomar@users.noreply.github.com> * Bump pytest from 7.4.2 to 7.4.3 (#4496) Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.2 to 7.4.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@7.4.2...7.4.3) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Enforce no date partition parameter in DAG (#4497) * Use mozfun.glean.parse_datetime to parse ping_info fields (#4464) In future versions of Glean that timestamp can be more precise, so we need to ensure we correctly parse it. Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Remove mmccorquodale from DAG owners (#4492) * Fix test for norm.glean_ping_info * Bump black from 23.9.1 to 23.10.1 Bumps [black](https://github.com/psf/black) from 23.9.1 to 23.10.1. - [Release notes](https://github.com/psf/black/releases) - [Changelog](https://github.com/psf/black/blob/main/CHANGES.md) - [Commits](psf/black@23.9.1...23.10.1) --- updated-dependencies: - dependency-name: black dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Bump sqlglot from 18.11.4 to 19.0.1 (#4500) Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 18.11.4 to 19.0.1. - [Changelog](https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md) - [Commits](tobymao/sqlglot@v18.11.4...v19.0.1) --- updated-dependencies: - dependency-name: sqlglot dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Materialized views and aggregated tables for event monitoring (#4478) * WIP event monitoring * Add FxA custom events to view definition (#4483) * Add FxA custom events to view definition * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql * Update sql_generators/event_monitoring/templates/event_monitoring_live.init.sql --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Move event monitoring to glean_usage generator * Add cross-app event monitoring view * Generate cross app monitoring * Simplyfy event monitoring aggregation --------- Co-authored-by: akkomar <akkomar@users.noreply.github.com> * Remove generated DAGs from main (#4507) * Add output_dir to command dag generate. (#4512) * Add output_dir to command dag generate. * output_dir to command dag generate. * output_dir to command dag generate. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com> * Bump pyarrow from 13.0.0 to 14.0.0 (#4511) Bumps [pyarrow](https://github.com/apache/arrow) from 13.0.0 to 14.0.0. - [Commits](apache/arrow@go/v13.0.0...go/v14.0.0) --- updated-dependencies: - dependency-name: pyarrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump pre-commit from 3.4.0 to 3.5.0 (#4510) Bumps [pre-commit](https://github.com/pre-commit/pre-commit) from 3.4.0 to 3.5.0. - [Release notes](https://github.com/pre-commit/pre-commit/releases) - [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md) - [Commits](pre-commit/pre-commit@v3.4.0...v3.5.0) --- updated-dependencies: - dependency-name: pre-commit dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove distinct_docids query (#4449) * Bump pip from 23.0 to 23.3 (#4516) Bumps [pip](https://github.com/pypa/pip) from 23.0 to 23.3. - [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst) - [Commits](pypa/pip@23.0...23.3) --- updated-dependencies: - dependency-name: pip dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump mkdocs-material from 9.3.1 to 9.4.7 (#4518) Bumps [mkdocs-material](https://github.com/squidfunk/mkdocs-material) from 9.3.1 to 9.4.7. - [Release notes](https://github.com/squidfunk/mkdocs-material/releases) - [Changelog](https://github.com/squidfunk/mkdocs-material/blob/master/CHANGELOG) - [Commits](squidfunk/mkdocs-material@9.3.1...9.4.7) --- updated-dependencies: - dependency-name: mkdocs-material dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Dont generate dags in bqetl query schedule command (#4517) * Add query to load application information from probe info service (#4508) * prefixing schema error message inside dryrun to "ERROR" to make it easier to find when searching logs for cause of exit code 1 (#4522) * updated schema for telemetry_derived/clients_last_seen_joined_v1 to align it with the query results (#4523) * Update scheduler of aggregates to run after upstreams. (#4503) * Update scheduler of aggregates to run after upstreams. * Update dags for new scheduler of analytics_aggregates * Update dag bqetl_search * Remove DAG. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com> * Set depend_on_past=False for warn checks (#4526) * Add map.set_key to mozfun (#4527) * Add map.set_key to mozfun * Disallow NULL keys in maps * DS-3281 - Add client adclicks history table (#4528) * Add client adclicks history table * Add alias to ad_click_history col Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Remove partition parameter on table write --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Add experiment information to event monitoring (#4519) * feat(DENG-1774): adding fenix derived firefox android clients v2 (#4424) * added fenix_derirved.firefox_android_clients_v2 * added ETL checks for fenix_derirved.firefox_android_clients_v2 * made changes as suggested by bani in PR#4424 * converting unique check for android clients v2 until duplication is resolved * added install_source field to firefox_android_clients_v2 and formatting applied on checks * added locale field and modified the query to suppot is_init() * removed generated dag due to new generation process * Add submission_date param to adclicks history (#4531) * DS-3054. Support running an initialization query in parallel (#4322) * DS-3054. Create functions to support running an initialization query for all sample_ids in parallel. * DS-3054. Update _run_query function. * DS-3054. Use _run_query and mapped values for initialization in parallel. * DS-3054. Unify initialization to run in parallel and get sample_id range from metadata. * DS-3054. Minimize formatting of query template and remove need to modify existing initialization queries. Validate if a query should use parallelized or regular update. * DS-3054. Adding link to caveats. * DS-3054. Update sample_id range for initialization. * DS-3054. Use current implementation of run_query. * DS-3054. Update using a parameter instead of initialization in metadata. * DS-3054. DAG update with new parameter. * Pass parameters before calling _run_query(). * Use --append_tablein favour of INSERT INTO. * DS-3054 Separate parallel and non parallel init, plus some improvements. --------- Co-authored-by: Lucia Vargas <lvargas@mozilla.com> * Add ios baseline_clients_yearly (#4506) * DENG-1935 Change data ordering from pings in clients-first-seen-v2 (#4533) * DENG-1935 Change data ordering from pings in clients-first-seen-v2 * Added main ping for client-3, maintain chosen ping * Fix comments in event monitoring queries (#4535) * DENG-1705 - Add missing client attribution columns to clients daily/first-seen (#4505) * DENG-1705 Add missing client attribution columns to clients daily/firstseen * Update clients_last_seen_joined * Rename main_v4 -> main_v5 in ssl_ratios tests (#4536) * Make base tables configurable in glean_usage generator (#4534) * Make base tables configurable in glean_usage generator * Fix event extras unnesting in event monitoring * Bump sqlglot from 19.0.1 to 19.0.3 (#4521) Bumps [sqlglot](https://github.com/tobymao/sqlglot) from 19.0.1 to 19.0.3. - [Changelog](https://github.com/tobymao/sqlglot/blob/main/CHANGELOG.md) - [Commits](tobymao/sqlglot@v19.0.1...v19.0.3) --- updated-dependencies: - dependency-name: sqlglot dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net> * DS-3272 - Review checker data model for mobile (#4498) * Add mobile shopping data * Remove the ff desktop from sql_generator * Fix build issue * Incorporate feedback from Bruce * Add clients table for mobile * FIX CI issue * Incorporate Bruce's feedback * Incorporate Curtis' feedback * Fix event_monitoring_aggregates_v1 template (#4537) This will ensure that FxA tables are included in the aggregate. * Fixing query error in fenix_derived/firefox_android_clients_v2/checks.sql (#4539) * Add missing clients view to fenix review checker (#4540) * add other projects to query from for bq usage, add for loop (#4529) * add other projects to query from for bq usage, add for loop * create new function to gather jobs_by_project data into temp table, update create_query function to join jobs_by_org table to jobs_by_project tmp table * take out date from tmp table as it is unnecessary * refactor to take out irrelevant function, rewrite SQL to look at other projects * add date filter to jobs_by_project * add comment for future refactoring * add tmp_table for jobs_by_project table * create function to loop through projects for jobs_by_project, revise query to join jobs_by_org with jobs_by_project tmp table * take out ambiguous DATE filter * take out r_prefix in regex from query string. Take out tmp table function. Add proper date filter * take out r_prefix in regex from query string. Take out tmp table function. Add proper date filter * add back in the r_prefix and add in the extra space in the Query ID regex that was needed * updated two affected fields across task_instance and trigger airflow metadata tables to type JSON (#4545) * Fix event monitoring template (#4546) Nulls need to be casted to string to make the union work. This will fix https://workflow.telemetry.mozilla.org/log?execution_date=2023-11-09T02%3A00%3A00%2B00%3A00&task_id=monitoring_derived__event_monitoring_aggregates__v1&dag_id=bqetl_monitoring&map_index=-1 * removed check for firefox_ios_clients_v1 which used different filtering settings causing result mismatch (#4547) * iOS attributable_clients use metrics adclicks (#4543) * iOS attributable_clients use metrics adclicks * Remove project id from table name Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com> --------- Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com> * Use correct submission_* field (#4549) * Use correct app_version field (#4551) * Revert "updated two affected fields across task_instance and trigger airflow metadata tables to type JSON (#4545)" (#4552) This reverts commit 9750d33. * DENG-1705 - Add startup_profile_selection_reason from first ping to clients_daily, clients_first_seen_v2 and downstream (#4482) * DENG-1705 - Add startup_profile_selection_reason to clients_first_seen * Add startup_profile_selection_reason_first_ping_only * Query typo * Update test schema * Update sql/moz-fx-data-shared-prod/telemetry_derived/clients_first_seen_28_days_later_v1/schema.yaml Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> --------- Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> * change filter on final query to go back to May 2023 - the min date in the Jobs by Project table as of 11/13/23 (#4559) * change filter on final query to go back in history * take out extraneous WHERE * add DISTINCT to final query * Add rust result types to product mapping (#4544) * missing-mobile-fields-review-checker (#4553) * noting that we are missing some fields * adding is_fx_dau to android and ios clients * add missing columns to schema.yaml add schema.yaml add schema.yaml * Delete sql/moz-fx-data-shared-prod/firefox_desktop/serp_events/view.sql --------- Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com> Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com> * Add aggregate table to monitor event errors (#4548) * updated fenix_derived.funnel_retention_clients_* to use clients view instead of table directly (#4563) * Bug 1864722 - Fix column name typo (#4567) * add referenced tables to metadata.yaml to make sure jobs_by_org task … (#4568) * add referenced tables to metadata.yaml to make sure jobs_by_org task runs before bigquery_usage_v2 task * Update sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/metadata.yaml Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> --------- Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> * Generate normal task dependencies from `depends_on` if the task is in the same DAG (#4569) * Generate normal task dependencies from `depends_on` if the task is in the same DAG. * Update `metadata.yaml` files to use `depends_on` rather than `upstream_dependencies`. * Add a period-over-period check for revenue data (#4566) * Check for period over period changes in column sum * Fix percent change calculation * Fix errors in navigation function logic * Rename period over period check to specify revenue * Remove references to period over period check --------- Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com> * feat(): updated fenix_derived.firefox_android_clients_v2 to include reported_baseline_ping field (#4565) * updated fenix_derived.firefox_android_clients_v2 to include reported_baseline_ping field * Update sql/moz-fx-data-shared-prod/fenix_derived/firefox_android_clients_v2/query.sql Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> --------- Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> * summing sap and ad clicks (#4571) * remove file that isn't ready yet (#4572) * Add ga.nullify_string UDF (#4556) * Add ga.nullify_string UDF * Add README line * added fenix_derived.firefox_android_clients_v2 to shredder config (#4564) * Use client_info.app_channel for event monitoring channels (#4575) * Add ga_sessions_v1 table & view (#4554) * Add ga_sessions_v1 table & view This table aggregates session-level data from GA. * Rename nullify string func * Apply suggestions from code review Co-authored-by: Alexander <anicholson@mozilla.com> * Add upstream backfill deps * Move depends_on to correct section --------- Co-authored-by: Alexander <anicholson@mozilla.com> * Make sure that metadata `friendly_name` and `description` are not None (#4513) * Fill empty description * Assign a friendly name if the table doesn't have one * Update metadata tests * Update bigquery_etl/metadata/parse_metadata.py Co-authored-by: Alexander <anicholson@mozilla.com> * update test again --------- Co-authored-by: Alexander <anicholson@mozilla.com> * Add back normalized_app_id (#4580) * Add session date param; fix checks CLI bug (#4579) * Fix checks to filter on partitions * Don't print "missing checks file" on success Previously, the statement that checks.sql files were missing was printed on any execution of the for statement. ("else" clauses after "for"s execute after completion of the "for" clause). Instead, we want to print only when there are no files. * Add derived stub attribution logs (#4557) * Add derived stub attribution logs This table keeps triplets from the stub attribution logs. The triplet of (dl_token, ga_client_id, stub_session_id) will only ever appear once here. See the associated decision brief: https://docs.google.com/document/d/1L4vOR0nCGawwSRPA9xiR8Hmu_8ozCGUecXAtBWmGGA0/edit * Move stub attribution table to new dataset In order to ensure limited access to the stub attribution service data without significantly decreasing developer velocity, we move these tables to a new dataset. That dataset has the defaults we want for all stub attribution log data: - Defaults to just read access to data-science/DUET workgroup - No read/write access for DE We will backfill via the bqetl_backfill DAG. * Rename view * Use correct dataset name in view * Skip dryrun; no access * Add gclid_conversions table & view (#4558) * Add gclid_conversions table & view This table will support the desktop conversion events. Each valid GCLID will have any associated conversion events. See the decision brief: https://docs.google.com/document/d/1T8ArA9r8HDMTj1ES9NHfJFv2gUWo7w0MjG07iXtuUOI * Use correct table name * Use new stub attribution dataset; clarify activity_date * Use correct date_partition_parameter Co-authored-by: Alexander <anicholson@mozilla.com> * Include activity_date as parameter * Use INNER instead of LEFT joins * Update doc strings to clarify GCLID vs GA Session --------- Co-authored-by: Alexander <anicholson@mozilla.com> * Include GA intraday sessions tables (#4582) * Include GA intraday sessions tables * Update doc string on backfilling ga_sessions * Dont dryrun stub_attribution view * Update min_row_count error text (#4586) * Add conversion event; fix gclid conversions query (#4584) * Add first_run conversion; use correct table names * Ignore dryrun of query and view * Remove HAVING clause; fix logical_or * migrates old pingcentre onboarding artifacts to new firefox_desktop view (#4457) * migrates old pingcentre onboarding artifacts to new firefox_desktop view * generate event rollup dag * generate review checker dag * update messaging system dag * incl project in table names --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Add ga_clients_v1 table & view (#4560) * Add ga_clients_v1 table & view - Query from ga_sessions - Fix tests * Use correct scheduling parameters Co-authored-by: Alexander <anicholson@mozilla.com> * Move HAVING clause to WHERE Co-authored-by: Alexander <anicholson@mozilla.com> * Change CTE name Co-authored-by: Alexander <anicholson@mozilla.com> --------- Co-authored-by: Alexander <anicholson@mozilla.com> * Remove duplicate BQ query param (#4587) * Firefox ios adclicks (#4585) * Add Firefox iOS client adclicks history * Add metadata description to view * DS-3272 - Fix review checker clients to remove dups (#4583) * Fix review checker clients to remove dups * Fix CI issues * Add row_num filter * add submission_date to partition * remove submission_date from partition * Account for NULL handling in joins (#4590) Previously, NULL values in the join keys didn't join, resulting in duplicate rows. This change will coalesce those to empty strings and NULLIFY them in the view. * Bug 1865716 - Include errorGroups in legacy docker_fxa_admin_server_sanitized query (#4589) `errorGroups` field was added in `docker_fxa_admin_server_sanitized_v2` and breaks the UNION. * DS-3361. Update documentation of initialize command. (#4592) Co-authored-by: Lucia Vargas <lvargas@mozilla.com> * Link to full diff in git comments (#4593) * Link to full diff in git comments * Show full diff of new and deleted files * Correct DAG description as DAG is currently active. (#4596) Co-authored-by: Lucia Vargas <lvargas@mozilla.com> * Login funnel conversions (#4591) * Mozilla accounts login funnel conversion for overall, with email confirmation, and with two factor authentication * Update sql_generators/funnels/configs/login_funnels.toml * Update sql_generators/funnels/configs/login_funnels.toml --------- Co-authored-by: Kimberly Siegler <kimberlysiegler@Kimberlys-MBP-2.attlocal.net> Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Use live tables to determine deletion request ping volume (#4442) * Increase no_output_timeout for long-running CI jobs (#4602) * SVCSE-1595 Setup import of tables from staging FxA databases (#4578) * In generated diffs explicitly list the files being added or deleted. (#4600) * Glam accounts for sampling when calculating sample_count for windows & release probes (#4581) * Glam - fix legacy windows & release probes' sample count going fwd * Glam FOG accounts for sampling when calculating total_sample for windows & release probes * fog - fix client count and sample count * Add channel filtering for fog * SVCSE-1595 Setup import of tables from production FxA databases (#4597) * Bug 1866469 - Exclude use_counters from GLAM ETL (#4603) * Bug 1866469 - Exclude use_counters from GLAM ETL * Attempt to fix tests --------- Co-authored-by: Eduardo Filho <edugomfilho@gmail.com> * feat(): updating fxa android funnel to support install_source filtering downstream (#4561) * Added a filter to only include playstore data In keeping the bottom of the funnel consistent with the upper funnel, we have to only include installs from play store in the bottom of the funnel metrics * for fenix_derived.funnel_retention_clients_week_* tables making sure we only include playstore users * updating the changes as requested by soGaussian to expose to users the install_source field to enable filtering --------- Co-authored-by: richard baffour <baffour345@gmail.com> * Add schema.yaml to urlbar_events (sql_generator) (#4595) * Add schema.yaml to urlbar_events * SVCSE-1595 Update accounts_db schemas to match deployed tables. (#4604) * SVCSE-1595 Update more accounts_db schemas to match deployed tables (#4605) * Fix num_chars_typed in urlbar_events schema (#4607) * Add init clause to ga_clients table (#4611) * Give census access to gclid conversions data (#4613) * Don't nest SQL generated from `main` branch in extra `sql` directory. (#4614) * Add desktop_acquisition_funnel view (#4616) * Add desktop_acquisition_funnel view * Update reference * Update view.sql Took out some of the TODO comments around naming to stay consistent with the table it is reading as well as reduce effort to make changes to the spoke-default view that is currently setup with test data. --------- Co-authored-by: gkabbz <gkabbz@gmail.com> * added ETL checks to fenix_derived.firefox_android_clients_v1 (#4609) * DENG-2013 - Add explicit dependencies & checks for history (#4620) * Fix the source table to point to unified view to include all apps (#4622) * Deng 1662 move google ads to ads google mmc connector (#4525) * DENG-1662 move from google_ads connector to ads_google_mmc connector * format queries * add code for cohort_daily_statistics using clients_first_seen_v2 with… (#4404) * add code for cohort_daily_statistics using clients_first_seen_v2 with new columns from clients_first_seen_v2 * take out extra sample_id * Update sql/moz-fx-data-shared-prod/telemetry_derived/cohort_daily_stats_clients_frst_seen_v2/query.sql switching column names - original was swapped Co-authored-by: Alexander <anicholson@mozilla.com> * update column names- change cohort_date to first_seen_date, make more descriptive; take out client_id and sample_id in the final table; take out extraneous columns that are not used in final table * fix group by - days_seen_bits not days_interacted_bits * take out second_seen_date, irrelevant * change date _activity to submission_date * replace submission_date_activity with client_activity * add new line at end of schema.yaml file * refactor code to use clients_first_seen_v2, originally commited cohorts_daily_statistics_v1 code in the v2 file * add cohort_daily_statistics_v2 job to DAG * add cohort_daily_statistics_v2 job to DAG, take out submission_date and add activity_date to query.sql * delete now needless dags folder * correct alias of table * change submission_date to activity_date * fix column name apple_model to apple_model_id * add days_seen_dau_bits and other calculations based on this * add attribution_dlsource to table * take out underscore from column name, attribution_dlsource * revise comment - 196 days not 180 days * add all the other columns from clients_first_seen_v2, update schema.yaml file with new columns * take out sample_id, fix schema * take out document_id, dl_token, app_build_id columns, rename activity_date to submission_date, rename cohort_date to first_seen_date to match clients_first_seen_28_days_later * move files from cohort_daily_statistics_v2 to desktop_cohort_daily_retention_v1 to reflect name change, take out extraneous colums such as xpcom_abi, attribution_dlsource, engine_data columns --------- Co-authored-by: Alexander <anicholson@mozilla.com> * add --project_id command, take out extraneous dashes in start and end commands in creating dataset cookbook (#4626) * change docs (#4629) * fix typo in project name (#4628) * fix typo in project name * remove shared-prod project from sql for google_ads_derived * Fixes #4624 - Add a view for firefox_desktop.broken_site_report (#4625) Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Separate Airflow tasks for glean_usage (#4588) * Add support for assigning Airflow tasks to task groups * Generate separate Airflow tasks for glean_usage * Remove Airflow dependencies from old glean_usage tasks * Update dataset_metadata.yaml for broken site reports (#4630) * Add user-facing view to fxa_oauth.clients (#4623) * Fix jinja templating in glean usage metadata (#4636) * feat(DENG-1774 / cancelled): deleting fenix_derived/firefox_android_clients_v2, v1 will remains the active model (#4610) * deleting fenix_derived/firefox_android_clients_v2, v1 will remain the active model * removed fenix_derived.firefox_android_clients_v2 from shredder config * firefox_ios source added to shredder config (#4638) * Skip check for baseline_clients_last_seen for Fire TV (#4640) * Resolve correct task_id for tasks nested in a group (#4637) * Android LTV UDFs (#4633) * Add Android State UDF * Add Android Markov States UDFs for LTV * Make docstrings consistent * Update doc string Co-authored-by: Leif Oines <leifdoines@gmail.com> --------- Co-authored-by: Leif Oines <leifdoines@gmail.com> * Migrated DIM checks over to ETL checks for internet_outages.global_outages_v1 (#4639) * Speed up glean_usage generation by caching the table getter (#4644) `get_tables` is deterministic under the assumption that the tables don't change in between invocations. Which I hope holds here. We therefore can just cache that value so that subsequent runs quickly return without needing a roundtrip to BigQuery again. * fixing broken test for firefox_ios_derived.baseline_clients_yearly_v1 (#4645) * Feat/deng 2046/migrating telemetry derived active users aggregates v1 dim checks to etl checks (#4641) * Migrated DIM checks over to ETL checks for telemetry_derived.active_users_aggregates_v1 * rewrite * code review suggestions * add doc * rename --------- Co-authored-by: kik-kik <kignasiak@mozilla.com> * Minimize previous PR diff comments when CI posts a new diff comment (#4635) * Minimize previous PR diff comments when CI posts a new diff comment. * Update Node image to latest version available from CircleCI and pin Node packages. * GLAM avoid scientific notation for big sample counts (#4647) * GLAM avoid scientific notation for big sample counts * Cast to bignumeric instead of numeric * feat(DENG-2083): added firefox_ios_derived.clients_activation_v1 and corresponding view (#4631) * added firefox_ios_derived.clients_activation_v1 and corresponding view * fixing a missing seperator in firefox_ios_derived.clients_activation_v1 checks * adding firefox_ios_derived.clients_activation_v1 to shredder configuration * removed is_suspicious_device_client as it should not be there, thanks bani for pointing this out * fixed black formatting error inside shredder/config.py * applied bqetl formatting * minor styling tweak as suggested by bani in PR#4631 * Remove baseline_clients_daily DAG dependency for FF ios baseline clients yearly (#4651) * Support offset backfills, require metadata (#4627) * Skip backfills for queries without metadata.yaml * Support date_partition_offset * Fixed exclude, modified exception * Add test for offset backfill * Apply suggestions from code review Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Formatting --------- Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * add dau_clients_days_since_seen to CTE and num_clients_dau_on_day column to table in query and schema (#4652) * Docs: Avoid newline in link mkdocs doesn't like that newline and will treat the URL as a relative URL, thus breaking the link * Docs: Use 3rd level heading for UDFs mkdocs' ToC generator will stop when the header level goes up again. Because the UDF name itself is generated as a first level heading, any UDF with a first-level header documentation will thus stop rendering any subsequent headers. Most notably on /mozfun/hist where only the very first UDF got a ToC entry. * Docs: Link to section on the same page The separate chapter was removed in #4293 * Migrated DIM checks over to ETL checks for telemetry_derived.unified_metrics_v1 (#4649) * feat(DENG-2120): migrated over checks defined in DIM for baseline_clients_last_seen fenix. (#4656) * migrated over checks defined in DIM for this type of dataset * Update sql_generators/glean_usage/templates/baseline_clients_last_seen_v1.checks.sql Co-authored-by: Anna Scholtz <anna@scholtzan.net> --------- Co-authored-by: Anna Scholtz <anna@scholtzan.net> * Create tables that have state values per day (#4634) * Create tables that have state values per day * Change Airflow DAG * Move markov states to cols rather than array * Move bot/bad client filter to materialized table * Add install_source and consecutive_days_seen features * Add field to CTE * Use jinja vars instead of sql variables * Use correct UDF incantation * Use live tables for structured error counts (#4598) * Use live tables for structured error counts * Prevent from old records being deleted * Fix structured_error_counts query (#4659) * Authorize view and add workgroup access for taskcluster (#4661) * Add metadata.yaml for socorro_crash_v2 (#4664) * Temporarily add curtis to CODEOWNERS until he can be added to group (#4665) * Add clients_daily_joined view (#4660) * add view.sql to telemetry and desktop_cohort_daily_retention view (#4666) * Skip accounts_db.fxa_oauth_clients in view validation (#4667) * Public GLAM datasets (#4606) * Public GLAM datasets * Remove Fenix GLAM datasets * DENG-1352 - Migrate contextual services ETL to desktop glean pings (#4474) * Have `bqetl query` commands fail if they don't find a matching query (#4662) * Have `bqetl query` commands fail if they don't find a matching query. * Update `test_run_query_no_query_file` test. * Skip accounts_db.fxa_oauth_clients dryrun (#4671) * Remove referenced_table from firefox_android_clients (#4674) * Define `event_monitoring_live_v1` views in `view.sql` files (#4576) * Define `event_monitoring_live_v1` views in `view.sql` files. So they get automatically deployed by the `bqetl_artifact_deployment.publish_views` Airflow task. * Support materialized views in view naming validation. * Handle `IF NOT EXISTS` in view naming validation. * Use regular expression to extract view ID in view naming validation. This simplifies the logic and avoids a sqlparse bug where it doesn't recognize the `MATERIALIZED` keyword. * Update other view regular expressions to allow for materialized views. * Add state location for US & Canadian VPN subscriptions (DENG-2099) (#4675) * add triage/confidential tag to docs (#4678) * feat(DENG-2156): added value_length check and updated some of the ETL checks to use the macro (#4672) * added value_length check and updated some of the ETL checks to use the macro * added the new check macro to the data checks docs * implemented lelilia feedback from PR#4672 * simplified the sql logic for the value_length check * Skipping copying checks for baseline tables for apps marked as not receiving the baseline ping (#4670) Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> * Revert "Define `event_monitoring_live_v1` views in `view.sql` files (#4576)" (#4680) This reverts commit 2c4cc5e. * Change directory to generate private DAGs so `sql_file_path` values are relative to the repo root. (#4668) * `cd` into `private-bigquery-etl` repo when generating DAGs. To avoid generated DAGs having incorrect absolute paths for ETLs using SQL scripts. * Revert "Temporarily add curtis to CODEOWNERS until he can be added to group (#4665)" (#4669) This reverts commit 8d94a86. * ci-fix Ignore dataset.update required permissions when dryrunning authorized views (#4681) * Refactor, add typehint * Add datasets.update clause denied for authorized views * add country dimension * remove generated and old files * delete genertated files * regenerate sql and delete more files * last edits to android funnel before review * change description fields * modify config to add retention outcomes --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Anna Scholtz <anna@scholtzan.net> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sergio E. Betancourt <37666064+sergiosonline@users.noreply.github.com> Co-authored-by: Curtis Morales <cmorales@mozilla.com> Co-authored-by: Frank Bertsch <frank.bertsch@gmail.com> Co-authored-by: m-d-bowerman <107562575+m-d-bowerman@users.noreply.github.com> Co-authored-by: akkomar <akkomar@users.noreply.github.com> Co-authored-by: Rebecca BurWei <rebecca.burwei@gmail.com> Co-authored-by: Alekhya Kommasani <akommasani@mozilla.com> Co-authored-by: Alekhya <88394696+alekhyamoz@users.noreply.github.com> Co-authored-by: Alexander <anicholson@mozilla.com> Co-authored-by: wil stuckey <wstuckey@mozilla.com> Co-authored-by: Daniel Thorn <dthorn@mozilla.com> Co-authored-by: Leli <33942105+lelilia@users.noreply.github.com> Co-authored-by: Jan-Erik Rediger <jrediger@mozilla.com> Co-authored-by: Lucia <30448600+lucia-vargas-a@users.noreply.github.com> Co-authored-by: Lucia Vargas <lvargas@mozilla.com> Co-authored-by: kik-kik <42538694+kik-kik@users.noreply.github.com> Co-authored-by: Marlene Hirose <92952117+Marlene-M-Hirose@users.noreply.github.com> Co-authored-by: David Zeber <dzeber@mozilla.com> Co-authored-by: betling <betling@mozilla.com> Co-authored-by: Sean Rose <1994030+sean-rose@users.noreply.github.com> Co-authored-by: Linh Nguyen <linhnguyen@mozilla.com> Co-authored-by: Mike Williams <102263964+mikewilli@users.noreply.github.com> Co-authored-by: ksiegler1 <ksiegler@mozilla.com> Co-authored-by: Kimberly Siegler <kimberlysiegler@Kimberlys-MBP-2.attlocal.net> Co-authored-by: Eduardo Filho <edugomfilho@gmail.com> Co-authored-by: richard baffour <baffour345@gmail.com> Co-authored-by: gkabbz <gkabbz@gmail.com> Co-authored-by: Ksenia <kberezina@mozilla.com> Co-authored-by: kik-kik <kignasiak@mozilla.com>
Followup to #4482 (review)
Adds missing columns to clients daily and first-seen. Sample data from beta here:
moz-fx-data-shared-prod.tmp.anich_clients_daily_1705_test
,moz-fx-data-shared-prod.tmp.anich_clients_first_seen_v2_test_1705
.┆Issue is synchronized with this Jira Task