Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive #3201

erindru · 2024-09-30T03:35:12Z

ref: issue #1315

erindru · 2024-10-01T04:01:59Z

.circleci/continue_config.yml

I'll revert these prior to merge.

As an aside, we need a better way of allowing cloud test runs to be triggered on the relevant PR's.

CircleCI appears to be fairly limited in this regard but we might be able to do something with GitHub Actions triggering a CircleCI workflow in response to a PR comment

tobymao · 2024-10-02T20:10:03Z

sqlmesh/core/engine_adapter/athena.py

+        # Truncating a partitioned Hive table is dropping all partitions and deleting the data from S3
+        if self._is_hive_partitioned_table(table_name):
+            self._clear_partition_data(table_name, exp.true())
+        else:


elif s3_location := self._query...

tobymao · 2024-10-02T20:11:58Z

sqlmesh/core/engine_adapter/athena.py

@@ -317,6 +342,8 @@ def _query_table_type(
        Hit the DB to check if this is a Hive or an Iceberg table
        """
        table_name = exp.to_table(table_name)
+        quote_identifiers(table_name)


why was this necessary?

It's because the query its used in, SHOW TBLPROPERTIES {table} is just a string and not an Expression.

since we are string building, we need to quote the identifiers manually. I couldn't figure out how to make it an Expression because sqlglot just turns it into an exp.Command string literal anyway

ok, we can remove this and on line 349 do sql(dialect='hive', identify=True)

Oh cool, forgot you could do that!

tobymao · 2024-10-02T20:12:38Z

sqlmesh/core/engine_adapter/athena.py

@@ -326,6 +353,15 @@ def _query_table_type(
                return "hive"
        return "iceberg"

+    def _is_hive_partitioned_table(self, table_name: exp.Table) -> bool:


if this takes a Table, maybe just call this arg table

Updated, yeah I kept switching between TableName and exp.Table because the parent classes use TableName but checking for strings all the time was getting tedious.

The general pattern throughout the codebase of making all the methods take a str or an exp.Expression is nice in some ways but annoying in others

sqlmesh/core/engine_adapter/athena.py

tobymao · 2024-10-02T20:15:00Z

sqlmesh/core/engine_adapter/athena.py

-        # Athena doesnt support TRUNCATE TABLE. The closest thing is DELETE FROM <table> but it only works on Iceberg
-        self.execute(f"DELETE FROM {table.sql(dialect=self.dialect, identify=True)}")
+        if isinstance(table_name, str):
+            table_name = exp.to_table(table_name)


i don't think this one is necessary, because you handle it it in all the other functions right?

_truncate_table needs this signature (or mypy will fail) because it's defined in the EngineAdapter superclass and im just overriding it

tobymao · 2024-10-02T20:15:12Z

sqlmesh/core/engine_adapter/athena.py

+            **kwargs,
+        )
+
+    def _clear_partition_data(self, table: TableName, where: t.Optional[exp.Condition]) -> None:


table -> table_name

or what you should do is clean up so all internal methods take in exp.Table instead

or what you should do is clean up so all internal methods take in exp.Table instead

Unfortunately a bunch are defined in the superclass which means they cant be easily cleaned up. I'll clean up the ones that only exist in this adapter class though

tobymao · 2024-10-02T20:18:23Z

sqlmesh/core/engine_adapter/athena.py

+
+        partition_values = [list(r) for r in self.fetchall(query, quote_identifiers=True)]
+
+        if len(partition_values) > 0:


if partition_values

tobymao · 2024-10-02T20:18:47Z

sqlmesh/core/engine_adapter/athena.py

+
+        return []
+
+    def _query_table_s3_location(self, table_name: exp.Table) -> str:


* Fix: mark kind changes as breaking in forward only plan (TobikoData#3207) * Feat: add support for parameterized python model names (TobikoData#3208) * Fix: Bigquery support of complex nested types (TobikoData#3190) * Feat: Snowflake: Handle forward_only changes to 'clustered_by' (TobikoData#3205) * Docs: add gateway variables to jinja macros concepts doc (TobikoData#3210) * Fix: avoid parsing column names into qualified columns in InsertOverwriteWithMergeMixin (TobikoData#3211) * Chore: bump sqlglot to v25.24.2 (TobikoData#3213) * Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive (TobikoData#3201) * Fix: load custom materializations on run (TobikoData#3216) * Fix: Infer column types when data type is omitted in dbt seeds (TobikoData#3215) * Chore: bump sqlglot to v25.24.3 (TobikoData#3217) * Fix: DBT seed column order (TobikoData#3221) * fix: web reloading caused iteration error (TobikoData#3220) * Fix: Make dbt adapter macros available in the local scope (TobikoData#3219) * Feat: Support DBT Athena adapter (TobikoData#3222) * chore: docs * Feat: Support SQLMesh project generation from dlt pipeline (TobikoData#3218) * Fix: Broken hive distro link in the test airflow image * Fix: Prevent loaded context from being used concurrently (TobikoData#3229) * Fix: Go back to using hive 3.1.3 for the Airflow test image * Fix: Support of custom roles for Postgres (TobikoData#3230) * Fix(redshift): regression in varchar length workaround (TobikoData#3225) * Fix: Force the CircleCI's git to use https links when running pre-commit (TobikoData#3235) * Fix: reset macro registry *after* loading models (TobikoData#3232) * Fix: Modify dlt query filter not to use alias reference (TobikoData#3233) * Fix: Support CLUSTER BY clause for the Databricks engine (TobikoData#3234) * Feat: BigQuery - Handle forward_only changes to clustered_by (TobikoData#3231) * chore: Fix typo in model_kinds.md (TobikoData#3239) * Feat: support custom unit testing schema names (TobikoData#3238) * Chore: Make the scheduler config extendable (TobikoData#3242) * Fix: use parentheses for databricks' CLUSTER BY clause (TobikoData#3240) * Fix: handle Paren in depends_on validator (TobikoData#3243) * fix: data diff for bigquery project parsing (TobikoData#3248) * Chore: Reintroduce parallelism in integration tests (TobikoData#3236) * Feat(databricks): Add OAuth support (TobikoData#3250) * Chore!: bump sqlglot to v25.25.0 (TobikoData#3252) * Adding markdown feature to model description (TobikoData#3228) * Fix: refactor table part parsing for Snowflake (TobikoData#3254) * Fix: always warn when an audit has failed (TobikoData#3255) * Chore: bump sqlglot to v25.25.1 (TobikoData#3256) * Ensure using project instead of execution project for temp table as default (TobikoData#3249) * Chore: Clarify that restatement plans ignore local changes (TobikoData#3257) * feat!: run-all bot command errors if anything within it errors (TobikoData#3262) * Fix(clickhouse): remove fractional seconds when time column is datetime/timestamp type (TobikoData#3261) * remove risingwave configuration from dbt * remove sink settings * remove risngwave sink * introducing risingwave as state syn engine * add risingwave connetion as test * change test case * Fix: Prevent extraction of dependencies from a rendered query for dbt models (TobikoData#3263) --------- Co-authored-by: Ben <9087625+benfdking@users.noreply.github.com> Co-authored-by: Jo <46752250+georgesittas@users.noreply.github.com> Co-authored-by: Themis Valtinos <73662635+Themiscodes@users.noreply.github.com> Co-authored-by: Erin Drummond <erin.dru@gmail.com> Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com> Co-authored-by: Alexander Butler <41213451+z3z1ma@users.noreply.github.com> Co-authored-by: Toby Mao <toby.mao@gmail.com> Co-authored-by: Iaroslav Zeigerman <zeigerman.ia@gmail.com> Co-authored-by: Vincent Chan <vchan@users.noreply.github.com> Co-authored-by: Vaggelis Danias <daniasevangelos@gmail.com> Co-authored-by: Harmuth94 <86912694+Harmuth94@users.noreply.github.com> Co-authored-by: Christophe Oudar <kayrnt@gmail.com> Co-authored-by: Chris Rericha <67359577+crericha@users.noreply.github.com> Co-authored-by: Ryan Eakman <6326532+eakmanrq@users.noreply.github.com> Co-authored-by: Justin Joseph <justin.joseph@tryg.dk>

erindru force-pushed the erin/athena-hive-incremental branch from 2906dbc to a30f1b4 Compare October 1, 2024 02:14

erindru marked this pull request as draft October 1, 2024 02:15

erindru force-pushed the erin/athena-hive-incremental branch 4 times, most recently from a2f9153 to b623489 Compare October 1, 2024 03:59

erindru commented Oct 1, 2024

View reviewed changes

erindru marked this pull request as ready for review October 1, 2024 04:50

erindru requested a review from izeigerman October 1, 2024 20:33

tobymao reviewed Oct 2, 2024

View reviewed changes

sqlmesh/core/engine_adapter/athena.py Show resolved Hide resolved

tobymao reviewed Oct 2, 2024

View reviewed changes

erindru added 2 commits October 2, 2024 20:25

Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive

c2d00f9

PR Feedback

94f6e9e

erindru force-pushed the erin/athena-hive-incremental branch from b623489 to 94f6e9e Compare October 2, 2024 20:56

tobymao approved these changes Oct 2, 2024

View reviewed changes

Revert circleci changes

ce2cb66

erindru merged commit bffefbe into main Oct 2, 2024
23 checks passed

erindru deleted the erin/athena-hive-incremental branch October 2, 2024 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive #3201

Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive #3201

erindru commented Sep 30, 2024 •

edited

Loading

erindru Oct 1, 2024

tobymao Oct 2, 2024

tobymao Oct 2, 2024

erindru Oct 2, 2024

tobymao Oct 2, 2024

erindru Oct 2, 2024

tobymao Oct 2, 2024

erindru Oct 2, 2024

tobymao Oct 2, 2024

erindru Oct 2, 2024

tobymao Oct 2, 2024

tobymao Oct 2, 2024

erindru Oct 2, 2024

tobymao Oct 2, 2024

tobymao Oct 2, 2024


		partition_values = [list(r) for r in self.fetchall(query, quote_identifiers=True)]

		if len(partition_values) > 0:


		return []

		def _query_table_s3_location(self, table_name: exp.Table) -> str:

Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive #3201

Feat: Support INCREMENTAL_BY_TIME_RANGE models on Athena/Hive #3201

Conversation

erindru commented Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erindru commented Sep 30, 2024 •

edited

Loading