Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: implement builtin date spine macro #2990

Merged
merged 23 commits into from
Aug 21, 2024
Merged

Feat: implement builtin date spine macro #2990

merged 23 commits into from
Aug 21, 2024

Conversation

sungchun12
Copy link
Contributor

@sungchun12 sungchun12 commented Aug 8, 2024

The goal of this PR is to bring a lot of UX familiarity from dbt-utils into SQLMesh native macros similar to the deduplicate macro I just contributed.

dbt-utils version of this macro: https://github.com/dbt-labs/dbt-utils/blob/main/macros/sql/date_spine.sql

This renders to SQL to generate a date spine table. It's typically used to join in unique, hard-coded, date ranges to join with other tables/views so people don't have to constantly adjust date ranges in where clauses everywhere.

  • From @afzaljasani : "When I worked with a lot of e-commerce companies they would have daily aggregate metrics from different sources like Facebook ads, Google ads, Shopify, etc. because most of these sources can’t be joined to each other they would join to a date spine table to get a 360 view of the business"
  • Do analysis specific to this range of promotional dates
  • I talked with someone at Modern Animal who said they don't like how dbt does it natively, so they built their own date_spine macro in house.

Design Highlights

  • Large upstream changes in SQLGlot had to happen due to generate_date_array transpiling and behaving very differently across query engines. Huge thanks to @georgesittas to making this PR way easier for me. He's a real one.
  • Verified it works with live query manual testing so far: snowflake, postgres, duckdb, databricks, spark, redshift, bigquery. I intentionally test for these as they have the most common usage.
  • Verified it works with a development version of SQLGlot: 25.11.2.dev1

Differences from dbt-utils' version

  • The spine will include the start_date (if it is aligned to the datepart), AND it will include the end_date
  • You can input macros AND plain string dates instead of casting them: @date_spine('quarter', '2022-01-01', '2024-12-16')
  • Instead of using cross joins to arbitrarily generate date intervals, this relies on native generate_series-like sql functions to be faster and cheaper for the majority of query engines. If those functions don't exist for a particular engine (ex: redshift), recursion will be used.

Other examples of ad hoc queries working across engines:
BigQuery
image

Redshift
image

Snowflake
image

docs/concepts/macros/sqlmesh_macros.md Outdated Show resolved Hide resolved
sqlmesh/core/macros.py Outdated Show resolved Hide resolved
sqlmesh/core/macros.py Outdated Show resolved Hide resolved
sqlmesh/core/macros.py Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ sungchun12
❌ Sung Won Chung


Sung Won Chung seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sungchun12 sungchun12 marked this pull request as ready for review August 20, 2024 23:20
@sungchun12
Copy link
Contributor Author

I signed the CLA with the correct github username. Commits under my personal name is likely due to using my personal laptop for commits. For all purposes, both commit signatures are the same person.

@georgesittas georgesittas changed the title Date Spine Macro Feat: implement builtin date spine macro Aug 21, 2024
Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sick how simple the implementation is now 🚀

The PR looks good, just a few comments to consider re: docs and test refactoring.

docs/concepts/macros/sqlmesh_macros.md Show resolved Hide resolved
sqlmesh/core/macros.py Outdated Show resolved Hide resolved
tests/core/test_macros.py Show resolved Hide resolved
tests/core/test_macros.py Show resolved Hide resolved
sungchun12 and others added 4 commits August 21, 2024 11:21
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
@sungchun12 sungchun12 merged commit c96b56d into main Aug 21, 2024
20 of 21 checks passed
@sungchun12 sungchun12 deleted the date_spine_macro branch August 21, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants