diff --git a/CHANGELOG.md b/CHANGELOG.md index 5f4e6162..1ad6630f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,13 @@ +### Release [1.6.0], 2023-11-30 +#### Improvements +- Compatible with dbt 1.6.x. Note that dbt new `clone` feature is not supported, as ClickHouse has no native "light weight" +clone functionality, and copying tables without actual data transfer is not possible in ClickHouse (barring file manipulation +outside ClickHouse itself). +- A new ClickHouse specific Materialized View materialization contributed by [Rory Sawyer](https://github.com/SoryRawyer). +This creates a ClickHouse Materialized view using the `TO` form with the name `_mv` and the associated target +table ``. It's highly recommended to fully understand how ClickHouse materialized views work before using +this materialization. + ### Release [1.5.2], 2023-11-28 #### Bug Fixes - The `ON CLUSTER` clause was in the incorrect place for legacy incremental materializations. This has been fixed. Thanks to diff --git a/README.md b/README.md index b5c8b8b8..8022f214 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@ pip install dbt-clickhouse - [x] Table materialization - [x] View materialization - [x] Incremental materialization +- [x] Materialized View materializations (uses the `TO` form of MATERIALIZED VIEW, experimental) - [x] Seeds - [x] Sources - [x] Docs generate @@ -102,16 +103,9 @@ your_profile_name: | settings | A map/dictionary of "TABLE" settings to be used to DDL statements like 'CREATE TABLE' with this model | | | query_settings | A map/dictionary of ClickHouse user level settings to be used with `INSERT` or `DELETE` statements in conjunction with this model | | -## A Note on Model Settings -ClickHouse has several types/levels of "settings". In the model configuration above, two types of these are configurable. `settings` means the `SETTINGS` -clause used in `CREATE TABLE/VIEW` types of DDL statements, so this is generally settings that are specific to the specific ClickHouse table engine. The new -`query_settings` is use to add a `SETTINGS` clause to the `INSERT` and `DELETE` queries used for model materialization (including incremental materializations). -There are hundreds of ClickHouse settings, and it's not always clear which is a "table" setting and which is a "user" setting (although the latter are generally -available in the `system.settings` table.) In general the defaults are recommended, and any use of these properties should be carefully researched and tested. - ## ClickHouse Cluster -`cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster. +The `cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster. ### Effective Scope @@ -130,6 +124,15 @@ table and incremental materializations with non-replicated engine will not be af If a model has been created without a `cluster` setting, dbt-clickhouse will detect the situation and run all DDL/DML without `on cluster` clause for this model. +## A Note on Model Settings + +ClickHouse has several types/levels of "settings". In the model configuration above, two types of these are configurable. `settings` means the `SETTINGS` +clause used in `CREATE TABLE/VIEW` types of DDL statements, so this is generally settings that are specific to the specific ClickHouse table engine. The new +`query_settings` is use to add a `SETTINGS` clause to the `INSERT` and `DELETE` queries used for model materialization (including incremental materializations). +There are hundreds of ClickHouse settings, and it's not always clear which is a "table" setting and which is a "user" setting (although the latter are generally +available in the `system.settings` table.) In general the defaults are recommended, and any use of these properties should be carefully researched and tested. + + ## Known Limitations * Ephemeral models/CTEs don't work if placed before the "INSERT INTO" in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This @@ -192,10 +195,10 @@ keys used to populate the parameters of the S3 table function: | fmt | The expected ClickHouse input format (such as `TSV` or `CSVWithNames`) of the referenced S3 objects. | | structure | The column structure of the data in bucket, as a list of name/datatype pairs, such as `['id UInt32', 'date DateTime', 'value String']` If not provided ClickHouse will infer the structure. | | aws_access_key_id | The S3 access key id. | -| aws_secret_access_key | The S3 secrete key. | +| aws_secret_access_key | The S3 secret key. | | compression | The compression method used with the S3 objects. If not provided ClickHouse will attempt to determine compression based on the file name. | -See the [S3 test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/test_s3.py) for examples of how to use this macro. +See the [S3 test file](https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/clickhouse/test_clickhouse_s3.py) for examples of how to use this macro. # Contracts and Constraints @@ -203,6 +206,14 @@ Only exact column type contracts are supported. For example, a contract with a ClickHouse also support _only_ `CHECK` constraints on the entire table/model. Primary key, foreign key, unique, and column level CHECK constraints are not supported. (See ClickHouse documentation on primary/order by keys.) +# Materialized Views (Experimental) +A `materialized_view` materialization should be a `SELECT` from an existing (source) table. The adapter will create a target table with the model name +and a ClickHouse MATERIALIZED VIEW with the name `_mv`. Unlike PostgreSQL, a ClickHouse materialized view is not "static" (and has +no corresponding REFRESH operation). Instead, it acts as an "insert trigger", and will insert new rows into the target table using the defined `SELECT` +"transformation" in the view definition on rows inserted into the source table. See the [test file] +(https://github.com/ClickHouse/dbt-clickhouse/blob/main/tests/integration/adapter/materialized_view/test_materialized_view.py) for an introductory example +of how to use this functionality. + # Distributed materializations Notes: diff --git a/dbt/adapters/clickhouse/__version__.py b/dbt/adapters/clickhouse/__version__.py index e8b09c2b..f7c7de21 100644 --- a/dbt/adapters/clickhouse/__version__.py +++ b/dbt/adapters/clickhouse/__version__.py @@ -1 +1 @@ -version = '1.5.2' +version = '1.6.0' diff --git a/dbt/adapters/clickhouse/connections.py b/dbt/adapters/clickhouse/connections.py index c4098649..dcb411f8 100644 --- a/dbt/adapters/clickhouse/connections.py +++ b/dbt/adapters/clickhouse/connections.py @@ -73,7 +73,7 @@ def get_table_from_response(cls, response, column_names) -> agate.Table: return dbt.clients.agate_helper.table_from_data_flat(data, column_names) def execute( - self, sql: str, auto_begin: bool = False, fetch: bool = False + self, sql: str, auto_begin: bool = False, fetch: bool = False, limit: Optional[int] = None ) -> Tuple[AdapterResponse, agate.Table]: # Don't try to fetch result of clustered DDL responses, we don't know what to do with them if fetch and ddl_re.match(sql): diff --git a/dbt/adapters/clickhouse/dbclient.py b/dbt/adapters/clickhouse/dbclient.py index ab5567e8..9b8e1ee1 100644 --- a/dbt/adapters/clickhouse/dbclient.py +++ b/dbt/adapters/clickhouse/dbclient.py @@ -169,7 +169,9 @@ def _ensure_database(self, database_engine, cluster_name) -> None: if cluster_name is not None and cluster_name.strip() != '' else '' ) - self.command(f'CREATE DATABASE {self.database}{cluster_clause}{engine_clause}') + self.command( + f'CREATE DATABASE IF NOT EXISTS {self.database}{cluster_clause}{engine_clause}' + ) db_exists = self.command(check_db) if not db_exists: raise FailedToConnectError( diff --git a/dbt/include/clickhouse/macros/materializations/materialized_view.sql b/dbt/include/clickhouse/macros/materializations/materialized_view.sql index f3c66cfd..8ba96d02 100644 --- a/dbt/include/clickhouse/macros/materializations/materialized_view.sql +++ b/dbt/include/clickhouse/macros/materializations/materialized_view.sql @@ -7,7 +7,7 @@ {%- set target_relation = this.incorporate(type='table') -%} {%- set mv_name = target_relation.name + '_mv' -%} - {%- set target_mv = api.Relation.create(identifier=mv_name, schema=schema, database=database, type='materializedview') -%} + {%- set target_mv = api.Relation.create(identifier=mv_name, schema=schema, database=database, type='materialized_view') -%} {%- set cluster_clause = on_cluster_clause(target_relation) -%} {# look for an existing relation for the target table and create backup relations if necessary #} diff --git a/dev_requirements.txt b/dev_requirements.txt index 5e1771ce..8906bfec 100644 --- a/dev_requirements.txt +++ b/dev_requirements.txt @@ -1,16 +1,16 @@ -dbt-core~=1.5.8 +dbt-core~=1.6.9 clickhouse-connect>=0.6.21 clickhouse-driver>=0.2.6 pytest>=7.2.0 pytest-dotenv==0.5.2 -dbt-tests-adapter~=1.5.8 -black==22.3.0 +dbt-tests-adapter~=1.6.9 +black==23.11.0 isort==5.10.1 mypy==0.991 yamllint==1.26.3 flake8==4.0.1 types-requests==2.27.29 -agate~=1.6.3 +agate~=1.7.1 requests~=2.27.1 setuptools~=65.3.0 types-setuptools==67.1.0.0 \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 34c3848d..68570715 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [tool.black] line-length = 100 skip-string-normalization = true -target-version = ['py38', 'py39'] +target-version = ['py310', 'py311'] exclude = '(\.eggs|\.git|\.mypy_cache|\.venv|venv|env|_build|build|build|dist|)' [tool.isort] diff --git a/setup.py b/setup.py index 0bb32f68..fb2d5311 100644 --- a/setup.py +++ b/setup.py @@ -25,7 +25,7 @@ def _dbt_clickhouse_version(): package_version = _dbt_clickhouse_version() description = '''The Clickhouse plugin for dbt (data build tool)''' -dbt_version = '1.5.0' +dbt_version = '1.6.0' dbt_minor = '.'.join(dbt_version.split('.')[0:2]) if not package_version.startswith(dbt_minor): @@ -58,7 +58,7 @@ def _dbt_clickhouse_version(): 'clickhouse-connect>=0.6.21', 'clickhouse-driver>=0.2.6', ], - python_requires=">=3.7", + python_requires=">=3.8", platforms='any', classifiers=[ 'Development Status :: 5 - Production/Stable', diff --git a/tests/integration/adapter/dbt_clone/test_dbt_clone.py b/tests/integration/adapter/dbt_clone/test_dbt_clone.py new file mode 100644 index 00000000..0252a2f7 --- /dev/null +++ b/tests/integration/adapter/dbt_clone/test_dbt_clone.py @@ -0,0 +1,7 @@ +import pytest +from dbt.tests.adapter.dbt_clone.test_dbt_clone import BaseClonePossible + + +@pytest.mark.skip("clone not supported") +class TestBaseClonePossible(BaseClonePossible): + pass diff --git a/tests/integration/adapter/test_materialized_view.py b/tests/integration/adapter/materialized_view/test_materialized_view.py similarity index 96% rename from tests/integration/adapter/test_materialized_view.py rename to tests/integration/adapter/materialized_view/test_materialized_view.py index 23452c40..b5efb018 100644 --- a/tests/integration/adapter/test_materialized_view.py +++ b/tests/integration/adapter/materialized_view/test_materialized_view.py @@ -1,5 +1,6 @@ """ -test materialized view creation +test materialized view creation. This is ClickHouse specific, which has a significantly different implementation +of materialized views from PostgreSQL or Oracle """ import json