-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Support SQLMesh project generation from dlt pipeline #3218
Conversation
can we get an example dlt project and have sqlmesh run on it? we'll also need to update the docs |
sqlmesh/integrations/dlt.py
Outdated
|
||
return f"""MODEL ( | ||
name {model_name}, | ||
kind INCREMENTAL_BY_UNIQUE_KEY ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always unique by key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was the simpler way since each dlt table has a unique load id column and use this as its key. And if someone wanted to implement some other kind of table they could adjust these later. But I can change it if you feel another type should serve better as a default generated table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you filter by dates here? otherwise you're always doing a full scan on from_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Altered the model kinds to INCREMENTAL_BY_TIME_RANGE
by using the load time (converting the load ids which are unix timestamps) as time_column
c150e48
to
2b7bb97
Compare
To load data from a dlt pipeline into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then simply execute the sqlmesh `init` command *within the dlt project root directory* using the `dlt` template option and specifying the pipeline's name with the `dlt-pipeline` option: | ||
|
||
```bash | ||
$ sqlmesh init -t dlt --dlt-pipeline <pipeline-name> dialect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you'll probably also want to define a start date or no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is generated automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes also added a function to extract the start date from the pipeline directly to be set in the sqlmesh config.yaml
setup.py
Outdated
@@ -67,6 +67,7 @@ | |||
"dbt-duckdb>=1.7.1", | |||
"dbt-snowflake", | |||
"dbt-bigquery", | |||
"dlt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want this in 2 places. Let's just add the dlt
target in Makefile where we need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, removed it from the dev
requirements and added it the makefile in install-dev
and install-cicd-test
since test_cli
is the test that requires it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment, LGTM otherwise
* Fix: mark kind changes as breaking in forward only plan (TobikoData#3207) * Feat: add support for parameterized python model names (TobikoData#3208) * Fix: Bigquery support of complex nested types (TobikoData#3190) * Feat: Snowflake: Handle forward_only changes to 'clustered_by' (TobikoData#3205) * Docs: add gateway variables to jinja macros concepts doc (TobikoData#3210) * Fix: avoid parsing column names into qualified columns in InsertOverwriteWithMergeMixin (TobikoData#3211) * Chore: bump sqlglot to v25.24.2 (TobikoData#3213) * Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive (TobikoData#3201) * Fix: load custom materializations on run (TobikoData#3216) * Fix: Infer column types when data type is omitted in dbt seeds (TobikoData#3215) * Chore: bump sqlglot to v25.24.3 (TobikoData#3217) * Fix: DBT seed column order (TobikoData#3221) * fix: web reloading caused iteration error (TobikoData#3220) * Fix: Make dbt adapter macros available in the local scope (TobikoData#3219) * Feat: Support DBT Athena adapter (TobikoData#3222) * chore: docs * Feat: Support SQLMesh project generation from dlt pipeline (TobikoData#3218) * Fix: Broken hive distro link in the test airflow image * Fix: Prevent loaded context from being used concurrently (TobikoData#3229) * Fix: Go back to using hive 3.1.3 for the Airflow test image * Fix: Support of custom roles for Postgres (TobikoData#3230) * Fix(redshift): regression in varchar length workaround (TobikoData#3225) * Fix: Force the CircleCI's git to use https links when running pre-commit (TobikoData#3235) * Fix: reset macro registry *after* loading models (TobikoData#3232) * Fix: Modify dlt query filter not to use alias reference (TobikoData#3233) * Fix: Support CLUSTER BY clause for the Databricks engine (TobikoData#3234) * Feat: BigQuery - Handle forward_only changes to clustered_by (TobikoData#3231) * chore: Fix typo in model_kinds.md (TobikoData#3239) * Feat: support custom unit testing schema names (TobikoData#3238) * Chore: Make the scheduler config extendable (TobikoData#3242) * Fix: use parentheses for databricks' CLUSTER BY clause (TobikoData#3240) * Fix: handle Paren in depends_on validator (TobikoData#3243) * fix: data diff for bigquery project parsing (TobikoData#3248) * Chore: Reintroduce parallelism in integration tests (TobikoData#3236) * Feat(databricks): Add OAuth support (TobikoData#3250) * Chore!: bump sqlglot to v25.25.0 (TobikoData#3252) * Adding markdown feature to model description (TobikoData#3228) * Fix: refactor table part parsing for Snowflake (TobikoData#3254) * Fix: always warn when an audit has failed (TobikoData#3255) * Chore: bump sqlglot to v25.25.1 (TobikoData#3256) * Ensure using project instead of execution project for temp table as default (TobikoData#3249) * Chore: Clarify that restatement plans ignore local changes (TobikoData#3257) * feat!: run-all bot command errors if anything within it errors (TobikoData#3262) * Fix(clickhouse): remove fractional seconds when time column is datetime/timestamp type (TobikoData#3261) * remove risingwave configuration from dbt * remove sink settings * remove risngwave sink * introducing risingwave as state syn engine * add risingwave connetion as test * change test case * Fix: Prevent extraction of dependencies from a rendered query for dbt models (TobikoData#3263) --------- Co-authored-by: Ben <9087625+benfdking@users.noreply.github.com> Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com> Co-authored-by: Themis Valtinos <73662635+Themiscodes@users.noreply.github.com> Co-authored-by: Erin Drummond <erin.dru@gmail.com> Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com> Co-authored-by: Alexander Butler <41213451+z3z1ma@users.noreply.github.com> Co-authored-by: Toby Mao <toby.mao@gmail.com> Co-authored-by: Iaroslav Zeigerman <zeigerman.ia@gmail.com> Co-authored-by: Vincent Chan <vchan@users.noreply.github.com> Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com> Co-authored-by: Harmuth94 <86912694+Harmuth94@users.noreply.github.com> Co-authored-by: Christophe Oudar <kayrnt@gmail.com> Co-authored-by: Chris Rericha <67359577+crericha@users.noreply.github.com> Co-authored-by: Ryan Eakman <6326532+eakmanrq@users.noreply.github.com> Co-authored-by: Justin Joseph <justin.joseph@tryg.dk>
WIP: This update introduces the ability to generate a SQLMesh project from a dlt pipeline. It creates the project directory scaffolding, inspects the schema and automatically generates incremental models from the tables and sets up the
config.yaml
connection configurations using the pipeline's credentials.Importing a DLT project:
To import a dlt project into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then, in the pipeline directory, use the
init
command specifying its name:sqlmesh init -t dlt --dlt-pipeline "pipeline_name" dialect
The resulting SQLMesh project can be executed as usual with
sqlmesh plan
.