Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V0.3.0 - Metric Function & Overhaul #37

Merged
merged 81 commits into from
Jul 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
64764ca
bringing it in line
callum-mcdata Jun 10, 2022
29d8da9
adding testing
callum-mcdata Jun 10, 2022
9c0911e
adding testing and changing metrics
callum-mcdata Jun 10, 2022
7eabaf4
removing get metric for metric function
callum-mcdata Jun 10, 2022
752756b
first draft
callum-mcdata Jun 12, 2022
93079c0
adding testing files
callum-mcdata Jun 13, 2022
cbea165
getting so close to composite metrics
callum-mcdata Jun 14, 2022
8553fbe
let there be composite metrics
callum-mcdata Jun 15, 2022
44f139e
fixing is valid dim
callum-mcdata Jun 15, 2022
8e2b6d9
adding compilation error on expression metric
callum-mcdata Jun 15, 2022
c2be148
adding mroe functionality
callum-mcdata Jun 16, 2022
dad1a61
secondary calcs
callum-mcdata Jun 16, 2022
0ae870e
reformatting
callum-mcdata Jun 16, 2022
11654cc
adding min/max date filters
callum-mcdata Jun 17, 2022
312b7ed
Auto update table of contents
callum-mcdata Jun 17, 2022
d3dbc07
fixing calendar dimension logic
callum-mcdata Jun 17, 2022
9068822
fixing calendar dims
callum-mcdata Jun 17, 2022
f7ad401
adding integration tests
callum-mcdata Jun 17, 2022
19660fb
adding more integration tests
callum-mcdata Jun 17, 2022
fa24c61
working on recursive pathing
callum-mcdata Jun 28, 2022
065c68f
beginning the metric to metrics change
callum-mcdata Jun 30, 2022
4fe63fd
pushing for review
callum-mcdata Jun 30, 2022
514d529
Auto update table of contents
callum-mcdata Jun 30, 2022
7a3691b
supporting secondary calcs
callum-mcdata Jun 30, 2022
ab70449
fixing tests for multiple metrics
callum-mcdata Jul 5, 2022
5166318
updating testing
callum-mcdata Jul 5, 2022
43f198e
fixed thigns finally
callum-mcdata Jul 5, 2022
14b29d3
updating readme for multiple metrics
callum-mcdata Jul 5, 2022
7fac834
Auto update table of contents
callum-mcdata Jul 5, 2022
1ca0975
changing CI flow for drew branch
callum-mcdata Jul 5, 2022
b597f6b
updating testing
callum-mcdata Jul 6, 2022
18a2854
updating requirements and long name
callum-mcdata Jul 6, 2022
ddd446c
fixes
callum-mcdata Jul 6, 2022
a6ad796
fixing custom calendar
callum-mcdata Jul 6, 2022
3cae77e
adding subquery alias for postgres
callum-mcdata Jul 6, 2022
f12946d
removing to date
callum-mcdata Jul 6, 2022
ec64285
fixing ref and removing ifnull
callum-mcdata Jul 6, 2022
74005b4
fixing some more things hopefully
callum-mcdata Jul 6, 2022
ba02ada
updates for postgres
callum-mcdata Jul 6, 2022
9d14765
testing other dbs
callum-mcdata Jul 6, 2022
bfa6250
updating for main
callum-mcdata Jul 6, 2022
0a8bfad
fixing profile
callum-mcdata Jul 6, 2022
269ed60
updating README
callum-mcdata Jul 11, 2022
ec1fb7a
Auto update table of contents
callum-mcdata Jul 11, 2022
1f864ee
updating for edge case
callum-mcdata Jul 11, 2022
f2eb588
fixes for short term
callum-mcdata Jul 12, 2022
d74bf5a
Merge pull request #40 from dbt-labs/feature_multiple_metrics
callum-mcdata Jul 12, 2022
8033793
Auto update table of contents
callum-mcdata Jul 12, 2022
5ba9fae
renaming to calculate
callum-mcdata Jul 12, 2022
d2a2e64
Merge pull request #46 from dbt-labs/renaming_to_calculate
callum-mcdata Jul 12, 2022
2e3b132
Update README.md
callum-mcdata Jun 13, 2022
961d412
changing to calculate
callum-mcdata Jul 12, 2022
f2a68f0
Auto update table of contents
callum-mcdata Jul 12, 2022
def0c22
rebase hell
callum-mcdata Jul 12, 2022
422c289
Merge branch 'main' into adding_metric_function_support
callum-mcdata Jul 12, 2022
49d5977
updating readme for calculate
callum-mcdata Jul 12, 2022
947e01f
Auto update table of contents
callum-mcdata Jul 12, 2022
b4e684a
updating cte logic
callum-mcdata Jul 14, 2022
c8ef88c
trying with version updated
callum-mcdata Jul 14, 2022
2444056
trying again
callum-mcdata Jul 14, 2022
e145ee8
fixing again please let this work
callum-mcdata Jul 14, 2022
fc63cb4
specifying version
callum-mcdata Jul 14, 2022
363030c
trying hash keys
callum-mcdata Jul 14, 2022
87b29af
full refresh seed
callum-mcdata Jul 14, 2022
fcfc69f
Merge pull request #51 from dbt-labs/updating_metric_tree_logic
callum-mcdata Jul 14, 2022
03b741a
cleaning up logging and cut code
callum-mcdata Jul 14, 2022
a6152a7
small fixes
callum-mcdata Jul 14, 2022
87b4a21
validating grain
callum-mcdata Jul 14, 2022
9a9b2a1
adding logic for common dimension lists
callum-mcdata Jul 14, 2022
2ae95b0
let there be macros
callum-mcdata Jul 14, 2022
f6b839a
Add some spacing
callum-mcdata Jul 15, 2022
8354ae2
Cleaner description
callum-mcdata Jul 15, 2022
bb19b64
joels code review
callum-mcdata Jul 15, 2022
1dae656
Merge pull request #52 from dbt-labs/grain_validation
callum-mcdata Jul 15, 2022
33fcf50
updating for cleanliness
callum-mcdata Jul 15, 2022
7339e45
validating grain
callum-mcdata Jul 14, 2022
686f620
let there be macros
callum-mcdata Jul 14, 2022
958232d
Add some spacing
callum-mcdata Jul 15, 2022
2ca18b5
Cleaner description
callum-mcdata Jul 15, 2022
57a6f44
joels code review
callum-mcdata Jul 15, 2022
55acd1f
Merge pull request #53 from dbt-labs/adding_common_dimension_list
callum-mcdata Jul 15, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/actions/end-to-end-test/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ inputs:
database-adapter-package:
description: "Name of database adapter to install"
required: true

runs:
using: "composite"
steps:
Expand All @@ -19,7 +20,7 @@ runs:
run: |
pip install --user --upgrade pip
pip --version
pip install --pre ${{ inputs.database-adapter-package }}
pip install -r dev-requirements.txt

- name: Setup dbt profile
shell: bash
Expand All @@ -32,5 +33,5 @@ runs:
run: |
cd ${{ inputs.dbt-project }}
dbt deps --target ${{ inputs.dbt-target }}
dbt seed --target ${{ inputs.dbt-target }}
dbt seed --target ${{ inputs.dbt-target }} --full-refresh
dbt build --target ${{ inputs.dbt-target }} --full-refresh
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
database-adapter-package: dbt-postgres

snowflake:
needs: postgres
# needs: postgres
runs-on: ubuntu-latest
steps:
- name: Check out the repository
Expand All @@ -76,7 +76,7 @@ jobs:
database-adapter-package: dbt-snowflake

redshift:
needs: postgres
# needs: postgres
runs-on: ubuntu-latest
steps:
- name: Check out the repository
Expand All @@ -99,7 +99,7 @@ jobs:
database-adapter-package: dbt-redshift

bigquery:
needs: postgres
# needs: postgres
runs-on: ubuntu-latest
steps:
- name: Check out the repository
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ dbt_modules/
dbt_packages/
logs/
.DS_Store
integration_tests/target
model_testing/
92 changes: 44 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,44 +4,50 @@
* [dbt_metrics](#dbt_metrics)
* [About](#about)
* [Installation Instructions](#installation-instructions)
* [A note on refs](#️-a-note-on-refs)
* [Usage](#usage)
* [Use cases](#use-cases)
* [Use cases and examples](#use-cases-and-examples)
* [Jaffle Shop Metrics](#jaffle-shop-metrics)
* [Inside of dbt Models](#inside-of-dbt-models)
* [Via the interactive dbt server (coming soon)](#via-the-interactive-dbt-server-coming-soon)
* [Secondary calculations](#secondary-calculations)
* [Period over Period (<a href="/macros/secondary_calculations/secondary_calculation_period_over_period.sql">source</a>)](#period-over-period-source)
* [Period to Date (<a href="/macros/secondary_calculations/secondary_calculation_period_to_date.sql">source</a>)](#period-to-date-source)
* [Rolling (<a href="/macros/secondary_calculations/secondary_calculation_rolling.sql">source</a>)](#rolling-source)
* [Customisation](#customisation)
* [Expression Metrics](#expression-metrics)
* [Multiple Metrics](#multiple-metrics)
* [Where Clauses](#where-clauses)
* [Calendar](#calendar)
* [Dimensions from calendar tables](#dimensions-from-calendar-tables)
* [Time Grains](#time-grains)
* [Custom aggregations](#custom-aggregations)
* [Secondary calculation column aliases](#secondary-calculation-column-aliases)
* [Experimental behaviour](#-experimental-behaviour)
* [Dimensions on calendar tables](#dimensions-on-calendar-tables)

<!-- Created by https://github.com/ekalinin/github-markdown-toc -->
<!-- Added by: runner, at: Tue Jun 7 13:41:32 UTC 2022 -->
<!-- Added by: runner, at: Tue Jul 12 18:14:25 UTC 2022 -->

<!--te-->

# About
This dbt package generates queries based on [metrics](https://docs.getdbt.com/docs/building-a-dbt-project/metrics), introduced to dbt Core in v1.0.
This dbt package generates queries based on [metrics](https://docs.getdbt.com/docs/building-a-dbt-project/metrics), introduced to dbt Core in v1.0. For more information on metrics, such as available types, properties, and other definition parameters, please reference the documentation linked above.

## Installation Instructions
Check [dbt Hub](https://hub.getdbt.com/dbt-labs/metrics/latest/) for the latest installation instructions, or [read the docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.

## ⚠️ A note on `ref`s
To enable the dynamic referencing of models necessary for macro queries through the dbt Server, queries generated by this package do not participate in the DAG and `ref`'d nodes will not necessarily be built before they are accessed. Refer to the docs on [forcing dependencies](https://docs.getdbt.com/reference/dbt-jinja-functions/ref#forcing-dependencies) for more details.
Include in your `package.yml`

```yaml
packages:
- package: dbt-labs/metrics
version: 0.3.0
```

# Usage
Access metrics [like any other macro](https://docs.getdbt.com/docs/building-a-dbt-project/jinja-macros#using-a-macro-from-a-package):
```sql
select *
from {{ metrics.metric(
metric_name='new_customers',
from {{ metrics.calculate(
metric('new_customers'),
grain='week',
dimensions=['plan', 'country'],
secondary_calculations=[
Expand All @@ -62,13 +68,16 @@ from {{ metrics.metric(

`start_date` and `end_date` are optional. When not provided, the spine will span all dates from oldest to newest in the metric's dataset. This default is likely to be correct in most cases, but you can use the arguments to either narrow the resulting table or expand it (e.g. if there was no new customers until 3 January but you want to include the first two days as well). Both values are inclusive.

# Use cases
# Use cases and examples

## Jaffle Shop Metrics
For those curious about how to implement metrics in a dbt project, please reference the [`jaffle_shop_metrics`](https://github.com/dbt-labs/jaffle_shop_metrics).

## Inside of dbt Models
You may want to materialize the results as a fixed table for querying. This is not the way we expect the dbt Metrics layer to add the most value, but is a way to experiment with the project without needing access to the interactive server.

## Via the interactive dbt server (coming soon)
When the [dbt server](https://blog.getdbt.com/licensing-dbt/) is released later in 2022, you will be able to access these macros interactively, without needing to build each variant as a single dbt model. For more information, check out the [keynote presentation from Coalesce 2021](https://www.getdbt.com/coalesce-2021/keynote-the-metrics-system/).
When [dbt server](https://blog.getdbt.com/licensing-dbt/) is released in late 2022, you will be able to access these macros interactively, without needing to build each variant as a single dbt model. For more information, check out the [keynote presentation from Coalesce 2021](https://www.getdbt.com/coalesce-2021/keynote-the-metrics-system/).

# Secondary calculations
Secondary calculations are window functions which act on the primary metric. You can use them to compare a metric's value to an earlier period and calculate year-to-date sums or rolling averages.
Expand Down Expand Up @@ -117,6 +126,25 @@ Constructor: `metrics.rolling(aggregate, interval [, alias])`
# Customisation
Most behaviour in the package can be overridden or customised.

## Expression Metrics
Version `0.3.0` of this package, and beyond, offer support for `expression` metrics! More information around this type can be found in the[`metrics` page of dbt docs](https://docs.getdbt.com/docs/building-a-dbt-project/metrics)/.

## Multiple Metrics
There may be instances where you want to return multiple metrics within a single macro. This is possible by providing a list of metrics instead of a single metric. See example below:

```sql
select *
from
{{ metrics.calculate(
[metric('base_sum_metric'), metric('base_average_metric')],
grain='day',
dimensions=['had_discount']
}}
```

**Note**: The metrics must share the `time_grain` selected in the macro AND the `dimensions` selected in the macro. If these are not shared between the 2+ metrics, this behaviour will fail. Additionally, secondary calculations can be used for multiple metrics but each secondary calculation will be applied against each metric and returned in a field that matches the following pattern: `metric_name_secondary_calculation_alias`.


## Where Clauses
Sometimes you'll want to see the metric in the context of a particular filter but this filter isn't neccesarily part of the metric definition. In this case, you can use the `where` parameter of the metrics package. It takes a list of `sql` statements and adds them in as filters to the first CTE in the produced SQL. This reduces the load on the query planner to run full table scans and will hopefully improve performance.

Expand All @@ -137,11 +165,14 @@ vars:
dbt_metrics_calendar_model: my_custom_calendar
```

### Dimensions from calendar tables
You may want to aggregate metrics by a dimension in your custom calendar table, for example `is_weekend`. You can include this within the list of `dimensions` in the macro call **without** it needing to be defined in the metric definition. The macro will correctly recognize that it is coming from the calendar dimension and treat it accordingly.

## Time Grains
The package protects against nonsensical secondary calculations, such as a month-to-date aggregate of data which has been rolled up to the quarter. If you customise your calendar (for example by adding a [4-5-4 retail calendar](https://calogica.com/sql/dbt/2018/11/15/retail-calendar-in-sql.html) month), you will need to override the [`get_grain_order()`](/macros/secondary_calculations/validate_grain_order.sql) macro. In that case, you might remove `month` and replace it with `month_4_5_4`. All date columns must be prefixed with `date_` in the table. Do not include the prefix when defining your metric, it will be added automatically.

## Custom aggregations
To create a custom primary aggregation (as exposed through the `type` config of a metric), create a macro of the form `metric_my_aggregate(expression)`, then override the [`aggregate_primary_metric()`](/macros/aggregate_primary_metric.sql) macro to add it to the dispatch list. The package also protects against nonsensical secondary calculations such as an average of an average; you will need to override the [`get_metric_allowlist()`](/macros/secondary_calculations/validate_aggregate_coherence.sql) macro to both add your new aggregate to to the existing aggregations' allowlists, and to make an allowlist for your new aggregation:
To create a custom primary aggregation (as exposed through the `type` config of a metric), create a macro of the form `metric_my_aggregate(expression)`, then override the [`gen_primary_metric_aggregate()`](/macros/gen_primary_metric_aggregate.sql) macro to add it to the dispatch list. The package also protects against nonsensical secondary calculations such as an average of an average; you will need to override the [`get_metric_allowlist()`](/macros/secondary_calculations/validate_aggregate_coherence.sql) macro to both add your new aggregate to to the existing aggregations' allowlists, and to make an allowlist for your new aggregation:
```
{% do return ({
"average": ['max', 'min'],
Expand All @@ -156,38 +187,3 @@ To create a custom secondary aggregation (as exposed through the `secondary_calc
## Secondary calculation column aliases
Aliases can be set for a secondary calculation. If no alias is provided, one will be automatically generated. To modify the existing alias logic, or add support for a custom secondary calculation, override [`generate_secondary_calculation_alias()`](/macros/secondary_calculations/generate_secondary_calculation_alias.sql).

# 🧪 Experimental behaviour
⚠️ This behaviour is subject to change in future versions of dbt Core and this package.

## Dimensions on calendar tables
You may want to aggregate metrics by a dimension in your custom calendar table, for example `is_weekend`. _In addition to_ the primary `dimensions` list, add the following `meta` properties to your metric:
```yaml
version: 2
metrics:
- name: new_customers
[...]
dimensions:
- plan
- country

meta:
dimensions:
- type: model
columns:
- plan
- country
- type: calendar
columns:
- is_weekend
```

You can then access the additional dimensions as normal:
```sql
select *
from {{ metrics.metric(
metric_name='new_customers',
grain='week',
dimensions=['plan', 'country', 'is_weekend'],
secondary_calcs=[]
) }}
```
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

require-dbt-version: [">=1.0.0", "<2.0.0"]
require-dbt-version: [">=1.2.0-a1", "<2.0.0"]

target-path: "target" # directory which will store compiled SQL files
clean-targets: # directories to be removed by `dbt clean`
Expand Down
4 changes: 4 additions & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
git+https://github.com/dbt-labs/dbt-redshift.git@c35865b
git+https://github.com/dbt-labs/dbt-snowflake.git@a017382
git+https://github.com/dbt-labs/dbt-bigquery.git@d52222e
git+https://github.com/dbt-labs/dbt-core.git@0db634d#egg=dbt-core&subdirectory=core
25 changes: 25 additions & 0 deletions examples/metric_jsonschema_example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"title": "dbt_metric_file",
"type": "object",
"required": ["name","label","timestamp","time_grains","type","sql"],
"additionalProperties": false,
"properties": {
"name": { "type": "string"},
"label": { "type": "string"},
"timestamp": { "type": "string"},
"time_grains": {
"type": "array",
"items":{
"type":"string"
}
},
"type": { "type": "string"},
"sql": { "type": "string"},
"dimensions": {
"type": "array",
"items":{
"type":"string"
}
}
}
}
2 changes: 2 additions & 0 deletions integration_tests/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
target/
dbt_packages/
logs/
examples/
model_testing/
3 changes: 2 additions & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ clean-targets:
- "logs"

vars:
dbt_metrics_calendar_model: custom_calendar
dbt_metrics_calendar_model: custom_calendar
testing: fact_orders
8 changes: 8 additions & 0 deletions integration_tests/macros/get_nodes_testing.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{% macro get_nodes_testing()%}

{%- set metric_relation = metric('total_profit') -%}
{{ log("MACRO: Node Unique ID: " ~ metric_relation.unique_id, info=true) }}
{{ log("MACRO: Depends on: " ~ metric_relation.depends_on, info=true) }}
{{ log("MACRO: Depends on nodes: " ~ metric_relation.depends_on.nodes, info=true) }}

{% endmacro %}
53 changes: 53 additions & 0 deletions integration_tests/macros/notes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{# Now we see if the node already exists in the metric tree and return that if
it does so that we're not creating duplicates #}
{# {%- if metric_tree[node.unique_id] is defined -%}

{% do return(metric_tree[node.unique_id]) -%}

{%- endif -%}

{# {{ log("Inside Macro Depends on: " ~ node.depends_on, info=true) }} #}


{# Here we create two sets, sets being the same as lists but they account for uniqueness.
One is the full set, which contains all of the parent metrics and the other is the leaf
set, which we'll use to determine the leaf, or base metrics. #}
{%- set full_set = [] -%}
{%- set leaf_set = [] -%}

{# We define parent nodes as being the parent nodes that begin with metric, which lets
us filter out model nodes #}
{%- set parent_nodes = node.depends_on.nodes -%}

{%- for parent_node in parent_nodes -%}

{# We set an if condition based on if parent nodes. If there are none, then this metric
is a leaf node and any recursive loop should end #}
{%- if parent_nodes -%}

{# Now we finally recurse through the nodes. We begin by filtering the overall list we
recurse through by limiting it to depending on metric nodes and not ALL nodes #}
{%- for parent_id in parent_nodes -%}

{# Then we add the parent_id of the metric to the full set. If it already existed
then it won't make an impact but we want to make sure it is represented #}
{%- do full_set.update(parent_id) -%}

{# And here we re-run the current macro but fill in the parent_id so that we loop again
with that metric information. You may be wondering, why are you using parent_id? Doesn't
the DAG always go from parent to child? Normally, yes! With this, no! We're reversing the
DAG and going up to parents to find the leaf nodes that are really parent nodes. #}
{%- set new_parent = metrics_list[parent_id] -%}
{%- do full_set.update(get_metric_parents(new_parent,metrics_list,metric_tree)) -%}

{%- endfor -%}

{%- else -%}

{%- do leaf_set.update(node.unique_id) -%}

{%- endif -%}

{%- endfor -%}

{%- do return(full_set) -%} #}
Loading