Feat: Support SQLMesh project generation from dlt pipeline #3218

themisvaltinos · 2024-10-03T18:19:34Z

WIP: This update introduces the ability to generate a SQLMesh project from a dlt pipeline. It creates the project directory scaffolding, inspects the schema and automatically generates incremental models from the tables and sets up the config.yaml connection configurations using the pipeline's credentials.

Importing a DLT project:

To import a dlt project into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then, in the pipeline directory, use the init command specifying its name:

sqlmesh init -t dlt --dlt-pipeline "pipeline_name" dialect

The resulting SQLMesh project can be executed as usual with sqlmesh plan.

sqlmesh/cli/example_project.py

sqlmesh/integrations/dlt.py

tobymao · 2024-10-03T23:44:59Z

can we get an example dlt project and have sqlmesh run on it? we'll also need to update the docs

sqlmesh/cli/main.py

sqlmesh/integrations/dlt.py

izeigerman · 2024-10-04T14:40:04Z

sqlmesh/integrations/dlt.py

+
+    return f"""MODEL (
+  name {model_name},
+  kind INCREMENTAL_BY_UNIQUE_KEY (


Is it always unique by key?

I thought this was the simpler way since each dlt table has a unique load id column and use this as its key. And if someone wanted to implement some other kind of table they could adjust these later. But I can change it if you feel another type should serve better as a default generated table

shouldn't you filter by dates here? otherwise you're always doing a full scan on from_table

Altered the model kinds to INCREMENTAL_BY_TIME_RANGE by using the load time (converting the load ids which are unix timestamps) as time_column

sqlmesh/integrations/dlt.py

tobymao · 2024-10-04T21:47:41Z

docs/integrations/dlt.md

+To load data from a dlt pipeline into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then simply execute the sqlmesh `init` command *within the dlt project root directory* using the `dlt` template option and specifying the pipeline's name with the `dlt-pipeline` option:
+
+```bash
+$ sqlmesh init -t dlt --dlt-pipeline <pipeline-name> dialect 


i think you'll probably also want to define a start date or no?

It is generated automatically

Yes also added a function to extract the start date from the pipeline directly to be set in the sqlmesh config.yaml

setup.py

izeigerman · 2024-10-07T17:13:58Z

setup.py

@@ -67,6 +67,7 @@
            "dbt-duckdb>=1.7.1",
            "dbt-snowflake",
            "dbt-bigquery",
+            "dlt",


I don't think we want this in 2 places. Let's just add the dlt target in Makefile where we need it.

Sure, removed it from the dev requirements and added it the makefile in install-dev and install-cicd-test since test_cli is the test that requires it

izeigerman

One small comment, LGTM otherwise

* Fix: mark kind changes as breaking in forward only plan (TobikoData#3207) * Feat: add support for parameterized python model names (TobikoData#3208) * Fix: Bigquery support of complex nested types (TobikoData#3190) * Feat: Snowflake: Handle forward_only changes to 'clustered_by' (TobikoData#3205) * Docs: add gateway variables to jinja macros concepts doc (TobikoData#3210) * Fix: avoid parsing column names into qualified columns in InsertOverwriteWithMergeMixin (TobikoData#3211) * Chore: bump sqlglot to v25.24.2 (TobikoData#3213) * Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive (TobikoData#3201) * Fix: load custom materializations on run (TobikoData#3216) * Fix: Infer column types when data type is omitted in dbt seeds (TobikoData#3215) * Chore: bump sqlglot to v25.24.3 (TobikoData#3217) * Fix: DBT seed column order (TobikoData#3221) * fix: web reloading caused iteration error (TobikoData#3220) * Fix: Make dbt adapter macros available in the local scope (TobikoData#3219) * Feat: Support DBT Athena adapter (TobikoData#3222) * chore: docs * Feat: Support SQLMesh project generation from dlt pipeline (TobikoData#3218) * Fix: Broken hive distro link in the test airflow image * Fix: Prevent loaded context from being used concurrently (TobikoData#3229) * Fix: Go back to using hive 3.1.3 for the Airflow test image * Fix: Support of custom roles for Postgres (TobikoData#3230) * Fix(redshift): regression in varchar length workaround (TobikoData#3225) * Fix: Force the CircleCI's git to use https links when running pre-commit (TobikoData#3235) * Fix: reset macro registry *after* loading models (TobikoData#3232) * Fix: Modify dlt query filter not to use alias reference (TobikoData#3233) * Fix: Support CLUSTER BY clause for the Databricks engine (TobikoData#3234) * Feat: BigQuery - Handle forward_only changes to clustered_by (TobikoData#3231) * chore: Fix typo in model_kinds.md (TobikoData#3239) * Feat: support custom unit testing schema names (TobikoData#3238) * Chore: Make the scheduler config extendable (TobikoData#3242) * Fix: use parentheses for databricks' CLUSTER BY clause (TobikoData#3240) * Fix: handle Paren in depends_on validator (TobikoData#3243) * fix: data diff for bigquery project parsing (TobikoData#3248) * Chore: Reintroduce parallelism in integration tests (TobikoData#3236) * Feat(databricks): Add OAuth support (TobikoData#3250) * Chore!: bump sqlglot to v25.25.0 (TobikoData#3252) * Adding markdown feature to model description (TobikoData#3228) * Fix: refactor table part parsing for Snowflake (TobikoData#3254) * Fix: always warn when an audit has failed (TobikoData#3255) * Chore: bump sqlglot to v25.25.1 (TobikoData#3256) * Ensure using project instead of execution project for temp table as default (TobikoData#3249) * Chore: Clarify that restatement plans ignore local changes (TobikoData#3257) * feat!: run-all bot command errors if anything within it errors (TobikoData#3262) * Fix(clickhouse): remove fractional seconds when time column is datetime/timestamp type (TobikoData#3261) * remove risingwave configuration from dbt * remove sink settings * remove risngwave sink * introducing risingwave as state syn engine * add risingwave connetion as test * change test case * Fix: Prevent extraction of dependencies from a rendered query for dbt models (TobikoData#3263) --------- Co-authored-by: Ben <9087625+benfdking@users.noreply.github.com> Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com> Co-authored-by: Themis Valtinos <73662635+Themiscodes@users.noreply.github.com> Co-authored-by: Erin Drummond <erin.dru@gmail.com> Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com> Co-authored-by: Alexander Butler <41213451+z3z1ma@users.noreply.github.com> Co-authored-by: Toby Mao <toby.mao@gmail.com> Co-authored-by: Iaroslav Zeigerman <zeigerman.ia@gmail.com> Co-authored-by: Vincent Chan <vchan@users.noreply.github.com> Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com> Co-authored-by: Harmuth94 <86912694+Harmuth94@users.noreply.github.com> Co-authored-by: Christophe Oudar <kayrnt@gmail.com> Co-authored-by: Chris Rericha <67359577+crericha@users.noreply.github.com> Co-authored-by: Ryan Eakman <6326532+eakmanrq@users.noreply.github.com> Co-authored-by: Justin Joseph <justin.joseph@tryg.dk>

tobymao reviewed Oct 3, 2024

View reviewed changes

sqlmesh/cli/example_project.py Outdated Show resolved Hide resolved

tobymao reviewed Oct 3, 2024

View reviewed changes

sqlmesh/integrations/dlt.py Outdated Show resolved Hide resolved

tobymao reviewed Oct 3, 2024

View reviewed changes

sqlmesh/integrations/dlt.py Outdated Show resolved Hide resolved

izeigerman reviewed Oct 4, 2024

View reviewed changes

sqlmesh/cli/main.py Outdated Show resolved Hide resolved

izeigerman reviewed Oct 4, 2024

View reviewed changes

sqlmesh/integrations/dlt.py Outdated Show resolved Hide resolved

izeigerman reviewed Oct 4, 2024

View reviewed changes

sqlmesh/integrations/dlt.py Show resolved Hide resolved

themisvaltinos added 8 commits October 4, 2024 18:59

Feat: Support project generation from dlt pipeline

8cfa6e5

Set gateway configs from credentials

a780073

Add quotes around password

6366a32

Validate connection configs against corresponding sqlmesh class

b9b0122

Fix pydantic tests

eecb1c2

Refactor

cfb4da5

Code cleanup and refactors

4605bf3

Add unit tests, example project; update docs; fixes

2b7bb97

themisvaltinos force-pushed the themis/dlt_project branch from c150e48 to 2b7bb97 Compare October 4, 2024 15:59

themisvaltinos added 4 commits October 4, 2024 19:09

Add missing return types

d0cba32

Address mypy error

a5fc8e9

Generate single model to load from dlt tables

998572c

Update docs

d24d2e6

tobymao reviewed Oct 4, 2024

View reviewed changes

themisvaltinos added 2 commits October 7, 2024 12:53

Change to incremental by time kind models

78608f1

Update docs

83e3954

georgesittas reviewed Oct 7, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

themisvaltinos added 2 commits October 7, 2024 19:04

Remove version constrain

b77398b

Dlt requirement

8981192

izeigerman reviewed Oct 7, 2024

View reviewed changes

izeigerman approved these changes Oct 7, 2024

View reviewed changes

Add dlt target in makefile

bec6a31

themisvaltinos added 2 commits October 7, 2024 22:39

Add dlt target in dev install

ca7fdbc

Align columns formatting

f8c5088

themisvaltinos merged commit 7029c40 into main Oct 8, 2024
23 checks passed

themisvaltinos deleted the themis/dlt_project branch October 8, 2024 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Support SQLMesh project generation from dlt pipeline #3218

Feat: Support SQLMesh project generation from dlt pipeline #3218

themisvaltinos commented Oct 3, 2024 •

edited

Loading

tobymao commented Oct 3, 2024

izeigerman Oct 4, 2024

themisvaltinos Oct 4, 2024

tobymao Oct 4, 2024

themisvaltinos Oct 7, 2024

tobymao Oct 4, 2024

izeigerman Oct 6, 2024

themisvaltinos Oct 7, 2024

izeigerman Oct 7, 2024

themisvaltinos Oct 7, 2024

izeigerman left a comment

Feat: Support SQLMesh project generation from dlt pipeline #3218

Feat: Support SQLMesh project generation from dlt pipeline #3218

Conversation

themisvaltinos commented Oct 3, 2024 • edited Loading

tobymao commented Oct 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman left a comment

Choose a reason for hiding this comment

themisvaltinos commented Oct 3, 2024 •

edited

Loading