TobikoData · themisvaltinos · Oct 8, 2024 · Oct 2, 2024 · Oct 2, 2024 · Oct 2, 2024
diff --git a/docs/integrations/dlt.md b/docs/integrations/dlt.md
@@ -0,0 +1,87 @@
+# dlt
+
+SQLMesh enables efforless project generation using data ingested through [dlt](https://github.com/dlt-hub/dlt). This involves creating a baseline project scaffolding, generating incremental models to process the data from the pipeline's tables by inspecting its schema and configuring the gateway connection using the pipeline's credentials.
+
+## Getting started
+### Reading from a dlt pipeline
+
+To load data from a dlt pipeline into SQLMesh, ensure the dlt pipeline has been run or restored locally. Then simply execute the sqlmesh `init` command *within the dlt project root directory* using the `dlt` template option and specifying the pipeline's name with the `dlt-pipeline` option:
+
+```bash
+$ sqlmesh init -t dlt --dlt-pipeline <pipeline-name> dialect 
+```
+
+This will create the configuration file and directories, which are found in all SQLMesh projects:
+
+- config.yaml
+    - The file for project configuration. Refer to [configuration](../reference/configuration.md).
+- ./models
+    - SQL and Python models. Refer to [models](../concepts/models/overview.md).
+- ./seeds
+    - Seed files. Refer to [seeds](../concepts/models/seed_models.md).
+- ./audits
+    - Shared audit files. Refer to [auditing](../concepts/audits.md).
+- ./tests
+    - Unit test files. Refer to [testing](../concepts/tests.md).
+- ./macros
+    - Macro files. Refer to [macros](../concepts/macros/overview.md).
+
+SQLMesh will also automatically generate models to ingest data from the pipeline incrementally. Incremental loading is ideal for large datasets where recomputing entire tables is resource-intensive. In this case utilizing the [`INCREMENTAL_BY_UNIQUE_KEY` model kind](../concepts/models/model_kinds.md#incremental_by_unique_key) with the unique `_dlt_load_id` key present in each dlt table. However, these model definitions can be customized to meet your specific project needs.
+
+#### Configuration
+
+SQLMesh will retrieve the data warehouse connection credentials from your dlt project to configure the `config.yaml` file. This configuration can be modified or customized as needed. For more details, refer to the [configuration guide](../guides/configuration.md).
+
+### Example
+
+Generating a SQLMesh project dlt is quite simple. In this example, we'll use the example `sushi_pipeline.py` from the [sushi-dlt project](https://github.com/TobikoData/sqlmesh/tree/main/examples/sushi_dlt).
+
+First, run the pipeline within the project directory:
+
+```bash
+$ python sushi_pipeline.py
+Pipeline sushi load step completed in 2.09 seconds
+Load package 1728074157.660565 is LOADED and contains no failed jobs
+```
+
+After the pipeline has run, generate a SQLMesh project by executing:
+
+```bash
+$ sqlmesh init -t dlt --dlt-pipeline sushi duckdb 
+```
+
+Then the SQLMesh project is all set up. You can then proceed to run the SQLMesh `plan` command to ingest the dlt pipeline data and populate the SQLMesh tables:
+
+```bash
+$ sqlmesh plan
+New environment `prod` will be created from `prod`
+Summary of differences against `prod`:
+Models:
+└── Added:
+    ├── sushi_dataset_sqlmesh.incremental__dlt_loads
+    ├── sushi_dataset_sqlmesh.incremental_sushi_types
+    └── sushi_dataset_sqlmesh.incremental_waiters
+Models needing backfill (missing dates):
+├── sushi_dataset_sqlmesh.incremental__dlt_loads: 2024-10-03 - 2024-10-03
+├── sushi_dataset_sqlmesh.incremental_sushi_types: 2024-10-03 - 2024-10-03
+└── sushi_dataset_sqlmesh.incremental_waiters: 2024-10-03 - 2024-10-03
+Apply - Backfill Tables [y/n]: y
+Creating physical table ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 3/3 • 0:00:00
+
+All model versions have been created successfully
+
+[1/1] sushi_dataset_sqlmesh.incremental__dlt_loads evaluated in 0.01s
+[1/1] sushi_dataset_sqlmesh.incremental_sushi_types evaluated in 0.00s
+[1/1] sushi_dataset_sqlmesh.incremental_waiters evaluated in 0.01s
+Evaluating models ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 3/3 • 0:00:00                                                                              
+
+
+All model batches have been executed successfully
+
+Virtually Updating 'prod' ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% • 0:00:00
+
+The target environment has been updated successfully
+```
+
+Once the models are planned and applied, you can continue as with any SQLMesh project, generating and applying [plans](../concepts/overview.md#make-a-plan), running [tests](../concepts/overview.md#tests) or [audits](../concepts/overview.md#audits), and executing models with a [scheduler](../guides/scheduling.md) if desired.
+
diff --git a/docs/integrations/overview.md b/docs/integrations/overview.md
@@ -5,6 +5,7 @@ SQLMesh supports integrations with the following tools:
 
 * [Airflow](airflow.md)
 * [dbt](dbt.md)
+* [dlt](dlt.md)
 * [GitHub Actions](github.md)
 * [Kestra](https://kestra.io/plugins/plugin-sqlmesh/tasks/cli/io.kestra.plugin.sqlmesh.cli.sqlmeshcli)
 

diff --git a/docs/reference/cli.md b/docs/reference/cli.md
@@ -214,7 +214,9 @@ Usage: sqlmesh init [OPTIONS] [SQL_DIALECT]
 
 Options:
   -t, --template TEXT  Project template. Supported values: airflow, dbt,
-                       default, empty.
+                       dlt, default, empty.
+  --dlt-pipeline TEXT  DLT pipeline for which to generate a SQLMesh project.
+                       This option is supported if the template is dlt.
   --help               Show this message and exit.
 ```
 

diff --git a/docs/reference/notebook.md b/docs/reference/notebook.md
@@ -70,7 +70,7 @@ options:
 
 #### init
 ```
-%init [--template TEMPLATE] path sql_dialect
+%init [--template TEMPLATE] [--dlt-pipeline PIPELINE] path sql_dialect
 
 Creates a SQLMesh project scaffold with a default SQL dialect.
 
@@ -87,7 +87,10 @@ positional arguments:
 options:
   --template TEMPLATE, -t TEMPLATE
                         Project template. Supported values: airflow, dbt,
-                        default, empty.
+                        dlt, default, empty.
+  --dlt-pipeline PIPELINE
+                        DLT pipeline for which to generate a SQLMesh project.
+                        This option is supported if the template is dlt.
 ```
 
 #### plan

diff --git a/examples/sushi_dlt/sushi_pipeline.py b/examples/sushi_dlt/sushi_pipeline.py
@@ -0,0 +1,35 @@
+import typing as t
+import dlt
+
+
+# Example sushi_types table
+@dlt.resource(name="sushi_types", primary_key="id", write_disposition="merge")
+def sushi_types() -> t.Iterator[t.Dict[str, t.Any]]:
+    yield from [
+        {"id": 0, "name": "Tobiko"},
+        {"id": 1, "name": "Sashimi"},
+        {"id": 2, "name": "Maki"},
+        {"id": 3, "name": "Temaki"},
+    ]
+
+
+# Example waiters table
+@dlt.resource(name="waiters", primary_key="id", write_disposition="merge")
+def waiters() -> t.Iterator[t.Dict[str, t.Any]]:
+    yield from [
+        {"id": 0, "name": "Toby"},
+        {"id": 1, "name": "Tyson"},
+        {"id": 2, "name": "Ryan"},
+        {"id": 3, "name": "George"},
+        {"id": 4, "name": "Chris"},
+        {"id": 5, "name": "Max"},
+        {"id": 6, "name": "Vincent"},
+        {"id": 7, "name": "Iaroslav"},
+        {"id": 8, "name": "Emma"},
+        {"id": 9, "name": "Maia"},
+    ]
+
+
+# Run the pipeline
+p = dlt.pipeline(pipeline_name="sushi", destination="duckdb")
+info = p.run([sushi_types(), waiters()])
diff --git a/setup.cfg b/setup.cfg
@@ -98,3 +98,6 @@ ignore_missing_imports = True
 
 [mypy-pydantic_core.*]
 ignore_missing_imports = True
+
+[mypy-dlt.*]
+ignore_missing_imports = True
diff --git a/setup.py b/setup.py
@@ -67,6 +67,7 @@
             "dbt-duckdb>=1.7.1",
             "dbt-snowflake",
             "dbt-bigquery",
+            "dlt",
             "Faker",
             "google-auth",
             "google-cloud-bigquery",
@@ -105,6 +106,9 @@
         "dbt": [
             "dbt-core<2",
         ],
+        "dlt": [
+            "dlt>=1.1.0",
+        ],
         "gcppostgres": [
             "cloud-sql-python-connector[pg8000]",
         ],

diff --git a/sqlmesh/cli/example_project.py b/sqlmesh/cli/example_project.py
@@ -4,23 +4,32 @@
 
 import click
 from sqlglot import Dialect
+from sqlmesh.integrations.dlt import generate_dlt_models_and_settings
 from sqlmesh.utils.date import yesterday_ds
 
 
 class ProjectTemplate(Enum):
     AIRFLOW = "airflow"
     DBT = "dbt"
+    DLT = "dlt"
     DEFAULT = "default"
     EMPTY = "empty"
 
 
-def _gen_config(dialect: t.Optional[str], template: ProjectTemplate) -> str:
+def _gen_config(
+    dialect: t.Optional[str], settings: t.Optional[str], template: ProjectTemplate
+) -> str:
+    connection_settings = (
+        settings
+        or """      type: duckdb
+      database: db.db"""
+    )
+
     default_configs = {
         ProjectTemplate.DEFAULT: f"""gateways:
   local:
     connection:
-      type: duckdb
-      database: db.db
+{connection_settings}
 
 default_gateway: local
 
@@ -31,8 +40,7 @@ def _gen_config(dialect: t.Optional[str], template: ProjectTemplate) -> str:
         ProjectTemplate.AIRFLOW: f"""gateways:
   local:
     connection:
-      type: duckdb
-      database: db.db
+      {connection_settings}
 
 default_gateway: local
 
@@ -55,6 +63,7 @@ def _gen_config(dialect: t.Optional[str], template: ProjectTemplate) -> str:
     }
 
     default_configs[ProjectTemplate.EMPTY] = default_configs[ProjectTemplate.DEFAULT]
+    default_configs[ProjectTemplate.DLT] = default_configs[ProjectTemplate.DEFAULT]
     return default_configs[template]
 
 
@@ -158,6 +167,7 @@ def init_example_project(
     path: t.Union[str, Path],
     dialect: t.Optional[str],
     template: ProjectTemplate = ProjectTemplate.DEFAULT,
+    pipeline: t.Optional[str] = None,
 ) -> None:
     root_path = Path(path)
     config_extension = "py" if template == ProjectTemplate.DBT else "yaml"
@@ -176,12 +186,26 @@ def init_example_project(
             "Default SQL dialect is a required argument for SQLMesh projects"
         )
 
-    _create_config(config_path, dialect, template)
+    models = None
+    settings = None
+    if template == ProjectTemplate.DLT:
+        if pipeline and dialect:
+            models, settings = generate_dlt_models_and_settings(pipeline, dialect)
+        else:
+            raise click.ClickException(
+                "DLT pipeline is a required argument to generate a SQLMesh project from DLT"
+            )
+
+    _create_config(config_path, dialect, settings, template)
     if template == ProjectTemplate.DBT:
         return
 
     _create_folders([audits_path, macros_path, models_path, seeds_path, tests_path])
 
+    if template == ProjectTemplate.DLT:
+        _create_models(models_path, models)
+        return
+
     if template != ProjectTemplate.EMPTY:
         _create_macros(macros_path)
         _create_audits(audits_path)
@@ -196,11 +220,16 @@ def _create_folders(target_folders: t.Sequence[Path]) -> None:
         (folder_path / ".gitkeep").touch()
 
 
-def _create_config(config_path: Path, dialect: t.Optional[str], template: ProjectTemplate) -> None:
+def _create_config(
+    config_path: Path,
+    dialect: t.Optional[str],
+    settings: t.Optional[str],
+    template: ProjectTemplate,
+) -> None:
     if dialect:
         Dialect.get_or_raise(dialect)
 
-    project_config = _gen_config(dialect, template)
+    project_config = _gen_config(dialect, settings, template)
 
     _write_file(
         config_path,
@@ -216,8 +245,8 @@ def _create_audits(audits_path: Path) -> None:
     _write_file(audits_path / "assert_positive_order_ids.sql", EXAMPLE_AUDIT)
 
 
-def _create_models(models_path: Path) -> None:
-    for model_name, model_def in [
+def _create_models(models_path: Path, models: t.Optional[t.Set[t.Tuple[str, str]]] = None) -> None:
+    for model_name, model_def in models or [
         (EXAMPLE_FULL_MODEL_NAME, EXAMPLE_FULL_MODEL_DEF),
         (EXAMPLE_INCREMENTAL_MODEL_NAME, EXAMPLE_INCREMENTAL_MODEL_DEF),
         (EXAMPLE_SEED_MODEL_NAME, EXAMPLE_SEED_MODEL_DEF),

diff --git a/sqlmesh/cli/main.py b/sqlmesh/cli/main.py
@@ -118,20 +118,30 @@ def cli(
     "-t",
     "--template",
     type=str,
-    help="Project template. Supported values: airflow, dbt, default, empty.",
+    help="Project template. Supported values: airflow, dbt, dlt, default, empty.",
+)
+@click.option(
+    "--dlt-pipeline",
+    type=str,
+    help="DLT pipeline for which to generate a SQLMesh project. Use alongside template: dlt",
 )
 @click.pass_context
 @error_handler
 @cli_analytics
 def init(
-    ctx: click.Context, sql_dialect: t.Optional[str] = None, template: t.Optional[str] = None
+    ctx: click.Context,
+    sql_dialect: t.Optional[str] = None,
+    template: t.Optional[str] = None,
+    dlt_pipeline: t.Optional[str] = None,
 ) -> None:
     """Create a new SQLMesh repository."""
     try:
         project_template = ProjectTemplate(template.lower() if template else "default")
     except ValueError:
         raise click.ClickException(f"Invalid project template '{template}'")
-    init_example_project(ctx.obj, dialect=sql_dialect, template=project_template)
+    init_example_project(
+        ctx.obj, dialect=sql_dialect, template=project_template, pipeline=dlt_pipeline
+    )
 
 
 @cli.command("render")