Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Support SQLMesh project generation from dlt pipeline #3218

Merged
merged 19 commits into from
Oct 8, 2024

Conversation

Themiscodes
Copy link
Contributor

@Themiscodes Themiscodes commented Oct 3, 2024

WIP: This update introduces the ability to generate a SQLMesh project from a dlt pipeline. It creates the project directory scaffolding, inspects the schema and automatically generates incremental models from the tables and sets up the config.yaml connection configurations using the pipeline's credentials.

Importing a DLT project:

To import a dlt project into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then, in the pipeline directory, use the init command specifying its name:

sqlmesh init -t dlt --dlt-pipeline "pipeline_name" dialect

The resulting SQLMesh project can be executed as usual with sqlmesh plan.

sqlmesh/integrations/dlt.py Outdated Show resolved Hide resolved
sqlmesh/integrations/dlt.py Outdated Show resolved Hide resolved
@tobymao
Copy link
Contributor

tobymao commented Oct 3, 2024

can we get an example dlt project and have sqlmesh run on it? we'll also need to update the docs

sqlmesh/cli/main.py Outdated Show resolved Hide resolved

return f"""MODEL (
name {model_name},
kind INCREMENTAL_BY_UNIQUE_KEY (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always unique by key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was the simpler way since each dlt table has a unique load id column and use this as its key. And if someone wanted to implement some other kind of table they could adjust these later. But I can change it if you feel another type should serve better as a default generated table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you filter by dates here? otherwise you're always doing a full scan on from_table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altered the model kinds to INCREMENTAL_BY_TIME_RANGE by using the load time (converting the load ids which are unix timestamps) as time_column

To load data from a dlt pipeline into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then simply execute the sqlmesh `init` command *within the dlt project root directory* using the `dlt` template option and specifying the pipeline's name with the `dlt-pipeline` option:

```bash
$ sqlmesh init -t dlt --dlt-pipeline <pipeline-name> dialect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you'll probably also want to define a start date or no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is generated automatically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes also added a function to extract the start date from the pipeline directly to be set in the sqlmesh config.yaml

setup.py Outdated Show resolved Hide resolved
setup.py Outdated
@@ -67,6 +67,7 @@
"dbt-duckdb>=1.7.1",
"dbt-snowflake",
"dbt-bigquery",
"dlt",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want this in 2 places. Let's just add the dlt target in Makefile where we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed it from the dev requirements and added it the makefile in install-dev and install-cicd-test since test_cli is the test that requires it

Copy link
Member

@izeigerman izeigerman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, LGTM otherwise

@Themiscodes Themiscodes merged commit 7029c40 into main Oct 8, 2024
23 checks passed
@Themiscodes Themiscodes deleted the themis/dlt_project branch October 8, 2024 15:13
justinjoseph89 added a commit to trygforsikring/sqlmesh that referenced this pull request Oct 21, 2024
* Fix: mark kind changes as breaking in forward only plan (TobikoData#3207)

* Feat: add support for parameterized python model names (TobikoData#3208)

* Fix: Bigquery support of complex nested types (TobikoData#3190)

* Feat: Snowflake: Handle forward_only changes to 'clustered_by' (TobikoData#3205)

* Docs: add gateway variables to jinja macros concepts doc (TobikoData#3210)

* Fix: avoid parsing column names into qualified columns in InsertOverwriteWithMergeMixin (TobikoData#3211)

* Chore: bump sqlglot to v25.24.2 (TobikoData#3213)

* Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive (TobikoData#3201)

* Fix: load custom materializations on run (TobikoData#3216)

* Fix: Infer column types when data type is omitted in dbt seeds (TobikoData#3215)

* Chore: bump sqlglot to v25.24.3 (TobikoData#3217)

* Fix: DBT seed column order (TobikoData#3221)

* fix: web reloading caused iteration error (TobikoData#3220)

* Fix: Make dbt adapter macros available in the local scope (TobikoData#3219)

* Feat: Support DBT Athena adapter (TobikoData#3222)

* chore: docs

* Feat: Support SQLMesh project generation from dlt pipeline (TobikoData#3218)

* Fix: Broken hive distro link in the test airflow image

* Fix: Prevent loaded context from being used concurrently (TobikoData#3229)

* Fix: Go back to using hive 3.1.3 for the Airflow test image

* Fix: Support of custom roles for Postgres (TobikoData#3230)

* Fix(redshift): regression in varchar length workaround (TobikoData#3225)

* Fix: Force the CircleCI's git to use https links when running pre-commit (TobikoData#3235)

* Fix: reset macro registry *after* loading models (TobikoData#3232)

* Fix: Modify dlt query filter not to use alias reference (TobikoData#3233)

* Fix: Support CLUSTER BY clause for the Databricks engine (TobikoData#3234)

* Feat: BigQuery - Handle forward_only changes to clustered_by (TobikoData#3231)

* chore: Fix typo in model_kinds.md (TobikoData#3239)

* Feat: support custom unit testing schema names (TobikoData#3238)

* Chore: Make the scheduler config extendable (TobikoData#3242)

* Fix: use parentheses for databricks' CLUSTER BY clause (TobikoData#3240)

* Fix: handle Paren in depends_on validator (TobikoData#3243)

* fix: data diff for bigquery project parsing (TobikoData#3248)

* Chore: Reintroduce parallelism in integration tests (TobikoData#3236)

* Feat(databricks): Add OAuth support (TobikoData#3250)

* Chore!: bump sqlglot to v25.25.0 (TobikoData#3252)

* Adding markdown feature to model description (TobikoData#3228)

* Fix: refactor table part parsing for Snowflake (TobikoData#3254)

* Fix: always warn when an audit has failed (TobikoData#3255)

* Chore: bump sqlglot to v25.25.1 (TobikoData#3256)

* Ensure using project instead of execution project for temp table as default (TobikoData#3249)

* Chore: Clarify that restatement plans ignore local changes (TobikoData#3257)

* feat!: run-all bot command errors if anything within it errors (TobikoData#3262)

* Fix(clickhouse): remove fractional seconds when time column is datetime/timestamp type (TobikoData#3261)

* remove risingwave configuration from dbt

* remove sink settings

* remove risngwave sink

* introducing risingwave as state syn engine

* add risingwave connetion as test

* change test case

* Fix: Prevent extraction of dependencies from a rendered query for dbt models (TobikoData#3263)

---------

Co-authored-by: Ben <9087625+benfdking@users.noreply.github.com>
Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com>
Co-authored-by: Themis Valtinos <73662635+Themiscodes@users.noreply.github.com>
Co-authored-by: Erin Drummond <erin.dru@gmail.com>
Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com>
Co-authored-by: Alexander Butler <41213451+z3z1ma@users.noreply.github.com>
Co-authored-by: Toby Mao <toby.mao@gmail.com>
Co-authored-by: Iaroslav Zeigerman <zeigerman.ia@gmail.com>
Co-authored-by: Vincent Chan <vchan@users.noreply.github.com>
Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com>
Co-authored-by: Harmuth94 <86912694+Harmuth94@users.noreply.github.com>
Co-authored-by: Christophe Oudar <kayrnt@gmail.com>
Co-authored-by: Chris Rericha <67359577+crericha@users.noreply.github.com>
Co-authored-by: Ryan Eakman <6326532+eakmanrq@users.noreply.github.com>
Co-authored-by: Justin Joseph <justin.joseph@tryg.dk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants