Feat: Managed models #2809

erindru · 2024-06-24T00:58:46Z

Initial implementation of Managed models. This defines the general semantics and contains an implementation for Snowflake, based on Dynamic Tables.

As a side effect, this PR also fixes issue 1049.

Follow-up PR's:

Support for other engines
Cleanup / compaction support, essentially the ability for something like sqlmesh clean to drop any objects that aren't referenced in a specific environment. The assumption up until now is that storage is cheap so keeping old snapshots around for a while isn't a problem. However, costs could add up quickly with managed models because users would be paying for models to be refreshed that aren't actually being referenced

erindru · 2024-06-24T01:05:12Z

sqlmesh/core/engine_adapter/base.py

            else None
        )
        return exp.Create(
            this=table_name_or_schema,
-            kind="TABLE",
+            kind=table_kind or "TABLE",


I noticed that I could leverage all the existing CREATE TABLE machinery for Snowflake if I was just able to change CREATE TABLE to CREATE DYNAMIC TABLE.

So I made the kind a parameter rather than being hardcoded to "TABLE"

sqlmesh/core/snapshot/evaluator.py

tobymao · 2024-06-24T01:58:11Z

not sure i like engine managed.

what about dynamic, or live, or auto

erindru · 2024-06-24T02:22:04Z

not sure i like engine managed.

what about dynamic, or live, or auto

Dynamic = Snowflake terminology
Live = Databricks terminology

I'm not against those names per se, but I did think we were trying to remain vendor neutral.

Auto = could work? "Automatic Models"?

The other way to do this could be to drop any pretense of a generic implementation have a Kind per Engine. That way, we can name them accordingly and also "strongly type" the physical_properties and make them part of the Kind instead of arbitrary key-value pairs, which would allow easier validation via pydantic.

For example:

MODEL (
   name foo,
   kind SNOWFLAKE_DYNAMIC_TABLE (
     warehouse "small",
     target_lag '20 minutes'
   ),
  ...

The downside is that engine-specific Kind's sort of defeat the purpose, every other Kind defines something generic and the engine-specific parts are implemented in the adapters

tobymao · 2024-06-24T15:45:47Z

maybe auto then

sqlmesh/core/engine_adapter/base.py

sqlmesh/core/engine_adapter/snowflake.py

izeigerman · 2024-06-24T21:03:14Z

sqlmesh/core/snapshot/evaluator.py

+        # vendors providing managed tables (such as Snowflake's dynamic tables) tend to charge more for them, they arent as cheap as normal tables
+        if is_table_deployable:
+            logger.info("Creating managed table: %s", table_name)
+            self.adapter.create_engine_managed_table(


Shouldn't we defer the creation of the deployable table until the change is promoted to production?

sqlmesh/core/model/definition.py

erindru · 2024-06-25T05:18:15Z

sqlmesh/core/snapshot/evaluator.py

@@ -1661,6 +1663,93 @@ def get_custom_materialization_type(name: str) -> t.Type[CustomMaterialization]:
    return strategy_type


+class EngineManagedStrategy(PromotableStrategy):


Right, so, I think I have figured out what to do here.

On create(), because the DeployabilityIndex always returns False to allow creating self-referential models, we just create the dev table and carry on. The call to create the prod table is a no-op because we dont yet know if it will be deployable.

On insert(), the DeployabilityIndex works as expected. If the current snapshot is deployable, we call create_managed_table. Otherwise, we replace the dev table with the preview data. We only do this for the "first insert" because managed models are not incremental, any subsequent calls will be a no-op.

This approach means that promotion works as expected with no need for a table creation side-effect and we dont create unnecessary managed tables if the plan is forward-only / the snapshot is not deployable

This sounds really good 👍

Actually, this won't work if the "MANAGED" model snapshot IS deployable and has downstream dependencies that are part of the same plan. The downstream models will attempt to create their own tables referencing the deployable version of the "MANAGED" model and it will be missing at that point.

What you probably need to do is to check whether the snapshot is deployable before we modify the deployability index to make the current snapshot non-deployable. Eg. add something like this:

is_snapshot_deployable = deployability_index.is_deployable(snapshot)

after this line: https://github.com/TobikoData/sqlmesh/blob/main/sqlmesh/core/snapshot/evaluator.py#L640

And then pass the flag into the create method of the strategy.

Of course 😞 , I was so focused on upstream models and if they were forward-only or not that I completely forgot about downstream models.

I've implemented your suggestion to pass the snapshot deployability to create() since we cant rely on the DeployabilityIndex there.

This changes the logic to:

on create(), if both the table and snapshot are deployable, we create a managed table. Otherwise, we fallback to the superclass logic which just creates a normal table

on insert(), we populate the table only if the snapshot is NOT deployable (which would have resulted in a normal table being created in create()). If the snapshot IS deployable, then it's a no-op because the data would have been populated in create().

There's now an extra case added to insert(). If a snapshot that was not deployable at creation (and thus would have been created as a normal table) suddenly becomes deployable - then we "upgrade" it to a managed table by dropping it and recreating it as a managed table

sqlmesh/core/model/definition.py

georgesittas · 2024-06-25T15:00:52Z

FYI in case it's somehow relevant to this PR: tobymao/sqlglot@60fa5e3.

erindru · 2024-06-25T21:48:56Z

FYI in case it's somehow relevant to this PR: tobymao/sqlglot@60fa5e3.

Ahh, thanks, good to know that exists now.

I think it's more relevant in a future PR. Using it now would involve modifying a bunch of method signatures (for example, drop_table would need to be modified to take table properties and the current mechanism of specifying table properties elsewhere assumes key-value pairs instead of individual properties for anything that isnt "top level")

izeigerman · 2024-06-26T16:51:36Z

sqlmesh/core/snapshot/evaluator.py

@@ -1071,6 +1081,7 @@ def create(
        table_name: str,
        model: Model,
        is_table_deployable: bool,
+        is_snapshot_deployable: bool,


Can we not make it a part of the public API to avoid making it more complex for users willing to implement a custom strategy. Can we hide this in kwargs?

Done, it's been banished back to kwargs

izeigerman · 2024-06-26T16:53:46Z

sqlmesh/core/snapshot/evaluator.py

+        snapshot: Snapshot = kwargs["snapshot"]
+        is_snapshot_deployable = deployability_index.is_deployable(snapshot)
+
+        if is_first_insert:


if is_first_insert and not is_snapshot_deployable:

Btw, why do we do this only on the first insert? Shouldn't we always update the dev table in dev?

My lack of knowledge around exactly when and how insert() is called multiple times.

Managed models are not incremental. The main model would only be created once if it was deployable, so I figured we should only populate the dev preview table once to match?

If the user changes something, isnt that a new snapshot / new table?

I guess there is no harm is doing it for all insert()'s, i've adjusted the logic

izeigerman · 2024-06-26T16:54:45Z

sqlmesh/core/snapshot/evaluator.py

+            if not is_snapshot_deployable:
+                # Snapshot isnt deployable; update the preview table instead
+                # If the snapshot was deployable, then data would have already been loaded in create() because a managed table would have been created
+                logger.info("Updating preview table: %s", table_name)


Let's clarify that the preview table is related to a managed model specifically.

sqlmesh/core/snapshot/evaluator.py

izeigerman · 2024-06-26T17:02:31Z

sqlmesh/core/snapshot/evaluator.py

+    ) -> None:
+        # Not entirely true, many engines support modifying some of the metadata fields on a managed table
+        # eg Snowflake allows you to ALTER DYNAMIC TABLE foo SET WAREHOUSE=my_other_wh;
+        raise ConfigError("Cannot mutate a managed table")


Let's add the target table name to the message

erindru · 2024-06-27T22:48:15Z

sqlmesh/core/snapshot/evaluator.py

+                    f"Table {table_name} exists but isnt being returned in the metadata. Cannot determine if it's a managed table or not"
+                )
+
+            if table_metadata[0].type != DataObjectType.MANAGED_TABLE:


This is where the "forward-only in a virtual environment that might eventually get deployed" edge case is resolved.

I added an integration test test_managed_model_upstream_forward_only to exercise this code

I'm not sure I understand how this scenario can happen.

To summarise: it was due to unecessarily creating the "main" table next to the dev table, when we only needed to create the dev table. This has now been adjusted

erindru · 2024-06-27T22:58:57Z

sqlmesh/core/engine_adapter/base.py

@@ -90,6 +90,7 @@ class EngineAdapter:
    SUPPORTS_MATERIALIZED_VIEWS = False
    SUPPORTS_MATERIALIZED_VIEW_SCHEMA = False
    SUPPORTS_CLONING = False
+    SUPPORTS_MANAGED_MODELS = False


I'm wondering if this should be SUPPORTS_MANAGED_TABLES instead. It's mainly just used to tell pytest to skip the managed model integration tests for adapters that dont support managed models

georgesittas · 2024-07-01T14:04:21Z

docs/concepts/models/managed_models.md

+    target_lag = '2 minutes',
+    data_retention_time_in_days = 2
+  )
+)


Suggested change

)

);

georgesittas · 2024-07-01T14:07:13Z

sqlmesh/core/engine_adapter/base.py

+        column_descriptions: t.Optional[t.Dict[str, str]] = None,
+        **kwargs: t.Any,
+    ) -> None:
+        """Create an managed table using a query.


Suggested change

"""Create an managed table using a query.

"""Create a managed table using a query.

georgesittas · 2024-07-01T14:14:09Z

sqlmesh/core/engine_adapter/snowflake.py

+        )
+
+        self._create_table_from_source_queries(
+            quote_identifiers(target_table),


Why are we quoting here? Note that by default _to_sql quotes the identifiers in the expression.

I think I saw it used elsewhere in a similar situation. Monkey see, monkey do :)

But you're right, it appears to be unnecessary. I've removed it

georgesittas · 2024-07-01T14:25:42Z

tests/core/engine_adapter/test_integration.py

@@ -2242,3 +2293,199 @@ def _mutate_config(current_gateway_name: str, config: Config):

    finally:
        ctx.cleanup(context)
+
+
+def test_managed_model_upstream_forward_only(ctx: TestContext):


Note: I liked how detailed this test is, thanks for providing context 👍

izeigerman · 2024-07-01T17:16:46Z

sqlmesh/core/snapshot/evaluator.py

+        render_kwargs: t.Dict[str, t.Any],
+        **kwargs: t.Any,
+    ) -> None:
+        is_snapshot_deployable: bool = render_kwargs["is_snapshot_deployable"]


This should be kwargs, right?

Ah, yes, since the merge that broke out render_kwargs and kwargs into two separate things. It looks like there is a missing test case, i'll add one

Actually, this was in the render_kwargs, but i'll move it

izeigerman · 2024-07-01T17:22:13Z

sqlmesh/core/snapshot/evaluator.py

+        snapshot: Snapshot = kwargs["snapshot"]
+        is_snapshot_deployable = deployability_index.is_deployable(snapshot)
+
+        if is_first_insert and is_snapshot_deployable and self.adapter.table_exists(table_name):


When do we handle the case when the deployable (prod) table doesn't exist?

The logic has been adjusted to create the managed table if it doesnt exist, rather than upgrading an existing non-managed table that didnt need to be created in the first place

izeigerman · 2024-07-01T21:13:50Z

sqlmesh/core/snapshot/evaluator.py

+                table_description=model.description,
+                column_descriptions=model.column_descriptions,
+            )
+        else:


else not is_table_deployable:

erindru · 2024-07-01T21:41:10Z

sqlmesh/core/snapshot/evaluator.py

@@ -652,6 +652,8 @@ def _create_snapshot(
            **common_render_kwargs,
            # Refers to self as non-deployable to successfully create self-referential tables / views.
            deployability_index=deployability_index.with_non_deployable(snapshot),
+            # However, it can still be useful to know if the snapshot was actually deployable
+            is_snapshot_deployable=deployability_index.is_deployable(snapshot),


The deployability index is already in the render kwargs which is why I put is_snapshot_deployable right next to it

Moved to kwargs

sqlmesh/core/snapshot/evaluator.py

izeigerman · 2024-07-01T22:54:53Z

sqlmesh/core/model/kind.py

@@ -755,6 +765,15 @@ def to_expression(
        )


+class ManagedKind(_ModelKind):
+    name: Literal[ModelKindName.MANAGED] = ModelKindName.MANAGED
+    disable_restatement: SQLGlotBool = True


Let's make it t.Literal[True] = True so that the value can't be overridden by a user.

Or make it a property method which returns True

izeigerman · 2024-07-01T23:00:42Z

docs/concepts/models/managed_models.md

+
+- Creating a Virtual Environment creates a pointer to the current model snapshot
+- Modifying the model causes a new snapshot to be created
+- Breaking upstream changes cause a new snapshot to be created


[Nit] Any upstream changes cause a new snapshot to be created

izeigerman

A few more smaller comments. LGTM otherwise!

erindru marked this pull request as draft June 24, 2024 00:58

erindru commented Jun 24, 2024

View reviewed changes

sqlmesh/core/snapshot/evaluator.py Outdated Show resolved Hide resolved

erindru force-pushed the erin/managed-tables branch 2 times, most recently from 9d5a107 to 2767e41 Compare June 24, 2024 02:04

izeigerman reviewed Jun 24, 2024

View reviewed changes

sqlmesh/core/engine_adapter/base.py Outdated Show resolved Hide resolved

izeigerman reviewed Jun 24, 2024

View reviewed changes

sqlmesh/core/engine_adapter/snowflake.py Outdated Show resolved Hide resolved

izeigerman reviewed Jun 24, 2024

View reviewed changes

sqlmesh/core/engine_adapter/snowflake.py Show resolved Hide resolved

izeigerman reviewed Jun 24, 2024

View reviewed changes

sqlmesh/core/model/definition.py Outdated Show resolved Hide resolved

izeigerman reviewed Jun 24, 2024

View reviewed changes

sqlmesh/core/model/definition.py Outdated Show resolved Hide resolved

erindru force-pushed the erin/managed-tables branch from 2767e41 to 9c9e7f8 Compare June 25, 2024 05:04

erindru changed the title ~~Feat: Engine-managed models~~ Feat: Managed models Jun 25, 2024

erindru commented Jun 25, 2024

View reviewed changes

georgesittas reviewed Jun 25, 2024

View reviewed changes

sqlmesh/core/model/definition.py Outdated Show resolved Hide resolved

erindru force-pushed the erin/managed-tables branch from b69f2f1 to d9753c4 Compare June 25, 2024 21:36

erindru marked this pull request as ready for review June 25, 2024 22:35

erindru requested review from izeigerman and georgesittas June 25, 2024 22:37

izeigerman reviewed Jun 26, 2024

View reviewed changes

sqlmesh/core/snapshot/evaluator.py Show resolved Hide resolved

izeigerman reviewed Jun 26, 2024

View reviewed changes

erindru force-pushed the erin/managed-tables branch from 502b67c to 0bed0c8 Compare June 27, 2024 22:44

erindru commented Jun 27, 2024

View reviewed changes

erindru force-pushed the erin/managed-tables branch 2 times, most recently from fb07bfa to 379153f Compare June 30, 2024 22:10

erindru requested a review from izeigerman June 30, 2024 22:20

georgesittas reviewed Jul 1, 2024

View reviewed changes

izeigerman reviewed Jul 1, 2024

View reviewed changes

erindru commented Jul 1, 2024

View reviewed changes

erindru added 2 commits July 1, 2024 22:28

Initial implementation of managed tables - Snowflake Dynamic Tables

5ce93e5

PR feedback

cb81b48

erindru force-pushed the erin/managed-tables branch from 4307201 to cb81b48 Compare July 1, 2024 22:29

izeigerman reviewed Jul 1, 2024

View reviewed changes

sqlmesh/core/snapshot/evaluator.py Show resolved Hide resolved

izeigerman reviewed Jul 1, 2024

View reviewed changes

izeigerman approved these changes Jul 1, 2024

View reviewed changes

Tweak docs and disable_restatement field definition on managed model

67e962c

erindru merged commit 606a9b4 into main Jul 1, 2024
15 checks passed

erindru deleted the erin/managed-tables branch July 1, 2024 23:49

		@@ -1661,6 +1663,93 @@ def get_custom_materialization_type(name: str) -> t.Type[CustomMaterialization]:
		return strategy_type


		class EngineManagedStrategy(PromotableStrategy):

	"""Create an managed table using a query.
	"""Create a managed table using a query.

Feat: Managed models #2809

Feat: Managed models #2809

Conversation

erindru commented Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

tobymao commented Jun 24, 2024

erindru commented Jun 24, 2024 • edited Loading

tobymao commented Jun 24, 2024

Choose a reason for hiding this comment

erindru Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgesittas commented Jun 25, 2024

erindru commented Jun 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman Jul 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

izeigerman left a comment

Choose a reason for hiding this comment

erindru commented Jun 24, 2024 •

edited

Loading

erindru commented Jun 24, 2024 •

edited

Loading

erindru Jun 25, 2024 •

edited

Loading

izeigerman Jun 25, 2024 •

edited

Loading

erindru Jul 1, 2024 •

edited

Loading

izeigerman Jul 1, 2024 •

edited

Loading

izeigerman Jul 1, 2024 •

edited

Loading