Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom granularities to mf timespine #6145

Merged
merged 27 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
d33aa6b
add custom granularities
mirnawong1 Sep 25, 2024
cf6500a
fix space
mirnawong1 Sep 25, 2024
07816ee
fix order
mirnawong1 Sep 25, 2024
15e86c2
add headers
mirnawong1 Sep 25, 2024
f1686d4
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Sep 25, 2024
4b9ebc4
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Sep 26, 2024
e7c5156
fold feedback
mirnawong1 Sep 26, 2024
3c0bd33
fix
mirnawong1 Sep 26, 2024
4a5b15c
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Sep 26, 2024
0d60378
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Sep 30, 2024
09e3a3a
Update metricflow-time-spine.md
mirnawong1 Sep 30, 2024
1f1e177
Update metricflow-time-spine.md
mirnawong1 Sep 30, 2024
89d3d1d
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 2, 2024
42132b9
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 2, 2024
405130f
Update release-notes.md
mirnawong1 Oct 2, 2024
98243aa
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 2, 2024
f5bd3d4
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 2, 2024
2921f0e
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 2, 2024
f626ab4
Update website/docs/docs/build/metricflow-time-spine.md
mirnawong1 Oct 2, 2024
60eef89
Update website/docs/docs/dbt-versions/release-notes.md
mirnawong1 Oct 2, 2024
46aabda
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 3, 2024
e023b96
Update website/docs/docs/dbt-versions/release-notes.md
mirnawong1 Oct 3, 2024
8ebdccc
Update release-notes.md
mirnawong1 Oct 3, 2024
beefe57
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 3, 2024
5e6a7a7
Update release-notes.md
mirnawong1 Oct 3, 2024
dd8547e
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 3, 2024
1ad6362
Merge branch 'current' into mwong-custom-calendar
mirnawong1 Oct 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 99 additions & 6 deletions website/docs/docs/build/metricflow-time-spine.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ sidebar_label: "MetricFlow time spine"
tags: [Metrics, Semantic Layer]
---

It's common in analytics engineering to have a date dimension or "time-spine" table as a base table for different types of time-based joins and aggregations. The structure of this table is typically a base column of daily or hourly dates, with additional columns for other time grains, like fiscal quarters, defined based on the base column. You can join other tables to the time spine on the base column to calculate metrics like revenue at a point in time, or to aggregate to a specific time grain.
It's common in analytics engineering to have a date dimension or "time spine" table as a base table for different types of time-based joins and aggregations. The structure of this table is typically a base column of daily or hourly dates, with additional columns for other time grains, like fiscal quarters, defined based on the base column. You can join other tables to the time spine on the base column to calculate metrics like revenue at a point in time, or to aggregate to a specific time grain.


MetricFlow requires you to define at least one dbt model which provides a time-spine, and then specify (in YAML) the columns to be used for time-based joins. MetricFlow will join against the time-spine model for the following types of metrics and dimensions:

Expand All @@ -16,34 +17,60 @@ MetricFlow requires you to define at least one dbt model which provides a time-s
- [Slowly Changing Dimensions](/docs/build/dimensions#scd-type-ii)
- [Metrics](/docs/build/metrics-overview) with the `join_to_timespine` configuration set to true

To see the generated SQL for the metric and dimension types that use time-spine joins, refer to the respective documentation or add the `compile=True` flag when querying the Semantic Layer to return the compiled SQL.
To see the generated SQL for the metric and dimension types that use time spine joins, refer to the respective documentation or add the `compile=True` flag when querying the Semantic Layer to return the compiled SQL.

## Configuring time-spine in YAML
## Configuring time spine in YAML

- The [`models` key](/reference/model-properties) for the time spine must be in your `models/` directory.
- Each time spine is a normal dbt model with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties.
- You likely already have a calendar table in your project which you can use. If you don't, review the [example time-spine tables](#example-time-spine-tables) for sample code.
- You add the configurations under the `time_spine` key for that [model's properties](/reference/model-properties), just as you would add a description or tests.
- You only need to configure time-spine models that the Semantic Layer should recognize.
- At a minimum, define a time-spine table for a daily grain.
- You can optionally define additional time-spine tables for different granularities, like hourly. Review the [granularity considerations](#granularity-considerations) when deciding which tables to create.

- If you're looking to specify the grain of a time dimension so that MetricFlow can transform the underlying column to the required granularity, refer to the [Time granularity documentation](/docs/build/dimensions?dimension=time_gran)

For example, given the following directory structure, you can create two time spine configurations, `time_spine_hourly` and `time_spine_daily`. MetricFlow supports granularities ranging from milliseconds to years. Refer to the [Dimensions page](/docs/build/dimensions?dimension=time_gran#time) (time_granularity tab) to find the full list of supported granularities.

:::tip
Previously, you had to create a model called `metricflow_time_spine` in your dbt project. Now, if your project already includes a date dimension or time spine table, you can simply configure MetricFlow to use that table by updating the `model` setting in the Semantic Layer.

If you don’t have a date dimension table, you can still create one by using the code snippet below to build your time spine model.
If you don’t have a date dimension table, you can still create one by using the following code snippet to build your time spine model.

:::

<Lightbox src="/img/time_spines.png" title="Time spine directory structure" />

<VersionBlock firstVersion="1.9">
<File name="models/_models.yml">

```yaml
[models:](/reference/model-properties)
- name: time_spine_hourly
description: "my favorite time spine"
time_spine:
standard_granularity_column: date_hour # column for the standard grain of your table, must be date time type."
custom_granularities:
- name: fiscal_year
column_name: fiscal_year_column
columns:
- name: date_hour
granularity: hour # set granularity at column-level for standard_granularity_column
- name: time_spine_daily
time_spine:
standard_granularity_column: date_day # column for the standard grain of your table
columns:
- name: date_day
granularity: day # set granularity at column-level for standard_granularity_column
```
</File>
</VersionBlock>

<VersionBlock lastVersion="1.8">
<File name="models/_models.yml">

```yaml
models:
- name: time_spine_hourly
description: A date spine with one row per hour, ranging from 2020-01-01 to 2039-12-31.
time_spine:
Expand All @@ -62,9 +89,31 @@ If you don’t have a date dimension table, you can still create one by using th
```

</File>
</VersionBlock>

For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example.

<Expandable alt_header="Understanding time spine and granularity">

- The previous configuration demonstrates a time spine model called `time_spine_daily`. It sets the time spine configurations under the `time_spine` key.
- The `standard_granularity_column` is the column that maps to one of our [standard granularities](/docs/build/dimensions?dimension=time_gran). The grain of this column must be finer or equal in size to the granularity of all custom granularity columns in the same model. In this case, it's hourly.
- It needs to reference a column defined under the `columns` key, in this case, `date_hour`.
- MetricFlow will use the `standard_granularity_column` as the join key when joining the time spine table to other source table.
- Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`.

Additionally, [the `custom_granularities` field](#custom-calendar), (available in dbt v1.9 and higher) lets you specify non-standard time periods like `fiscal_year` or `retail_month` that your organization may use.

</Expandable>

<Expandable alt_header="Creating a time spine table">

If you need to create a time spine table from scratch, you can do so by adding the following code to your dbt project.
The example creates a time spine at a daily grain and an hourly grain. A few things to note when creating time spine models:
* MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month.
* You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries.
* We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. i.e., if you have dimensions at an hourly grain, you should have a time spine at an hourly grain.
</Expandable>

Now, break down the configuration above. It's pointing to a model called `time_spine_daily`, and all the configuration is colocated with the rest of the [model's properties](/reference/model-properties). It sets the time spine configurations under the `time_spine` key. The `standard_granularity_column` is the lowest grain of the table, in this case, it's hourly. It needs to reference a column defined under the columns key, in this case, `date_hour`. Use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`.

### Considerations when choosing which granularities to create{#granularity-considerations}
Expand All @@ -73,7 +122,7 @@ Now, break down the configuration above. It's pointing to a model called `time_s
- You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries.
- We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. For example, if you have dimensions at an hourly grain, you should have a time spine at an hourly grain.

## Example time-spine tables
## Example time spine tables

### Daily

Expand Down Expand Up @@ -247,3 +296,47 @@ and date_hour < dateadd(day, 30, current_timestamp())
```

</File>


## Custom calendar <Lifecycle status="Preview"/>

<VersionBlock lastVersion="1.8">

Being able to configure custom calendars, such as a fiscal calendar, is available in [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) or dbt Core [v1.9 and above](/docs/dbt-versions/core).

To access this feature, [upgrade to Versionless](/docs/dbt-versions/versionless-cloud) or dbt Core v1.9 and above.
</VersionBlock>

<VersionBlock firstVersion="1.9">

Custom date transformations can be complex, and organizations often have unique needs that can’t be easily generalized. Creating a custom calendar model allows you to define these transformations in SQL, offering more flexibility than native transformations in MetricFlow. This approach lets you map custom columns back to MetricFlow granularities, ensuring consistency while giving you control over the transformations.

For example, if you use a custom calendar in your organization, such as a fiscal calendar, you can configure it in MetricFlow using its date and time operations.

- This is useful for calculating metrics based on a custom calendar, such as fiscal quarters or weeks.
- Use the `custom_granularities` key to define a non-standard time period for querying data, such as a `retail_month` or `fiscal_week`, instead of standard options like `day`, `month`, or `year`.
- Ensure the the `standard_granularity_column` is a date time type.
- This feature provides more control over how time-based metrics are calculated.

### Add custom granularities

To add custom granularities, the Semantic Layer supports custom calendar configurations that allow users to query data using non-standard time periods like `fiscal_year` or `retail_month`. You can define these custom granularities (all lowercased) by modifying your model's YAML configuration like this:

<File name="models/_models.yml">

```yaml
models:
- name: my_time_spine
description: my favorite time spine
time_spine:
standard_granularity_column: date_day
custom_granularities:
- name: fiscal_year
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
column_name: fiscal_year_column
```
</File>

#### Coming soon
Note that features like calculating offsets and period-over-period will be supported soon.

</VersionBlock>
4 changes: 4 additions & 0 deletions website/docs/docs/dbt-versions/release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ Release notes are grouped by month for both multi-tenant and virtual private clo

## October 2024

- **New**: The dbt Semantic Layer supports custom calendar configurations in MetricFlow, available in [Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud). Custom calendar configurations allow you to query data using non-standard time periods like `fiscal_year` or `retail_month`. Refer to [custom calendar](/docs/build/metricflow-time-spine#custom-calendar) to learn how to define these custom granularities in your MetricFlow timespine YAML configuration.
- **New**: In dbt Cloud Versionless, [Snapshots](/docs/build/snapshots) have been updated to use YAML configuration files instead of SQL snapshot blocks. This new feature simplifies snapshot management and improves performance, and will soon be released in dbt Core 1.9.
- Who does this affect? New user on Versionless can define snapshots using the new YAML specification. Users upgrading to Versionless who use snapshots can keep their existing configuration or can choose to migrate their snapshot definitions to YAML.
- Users on dbt 1.8 and earlier: No action is needed; existing snapshots will continue to work as before. However, we recommend upgrading to Versionless to take advantage of the new snapshot features.
- **Enhancement**: In dbt Cloud Versionless, snapshots defined in SQL files can now use `config` defined in `schema.yml` YAML files. This update resolves the previous limitation that required snapshot properties to be defined exclusively in `dbt_project.yml` and/or a `config()` block within the SQL file. This enhancement will be included in the upcoming dbt Core v1.9 release.
- **Enhancement**: In May 2024, dbt Cloud versionless began inferring a model's `primary_key` based on configured data tests and/or constraints within `manifest.json`. The inferred `primary_key` is visible in dbt Explorer and utilized by the dbt Cloud [compare changes](/docs/deploy/run-visibility#compare-tab) feature. This will also be released in dbt Core 1.9.
Read about the [order dbt infers columns can be used as primary key of a model](https://github.com/dbt-labs/dbt-core/blob/7940ad5c7858ff11ef100260a372f2f06a86e71f/core/dbt/contracts/graph/nodes.py#L534-L541).
Expand Down
Loading