- Added support for range_hashed and complex_key_range_hashed layouts to the dictionary materialization. (#361)
- The S3 help macro now support a
role_arn
parameter as an alternative way to provide authentication for S3 based models. Thanks to Mitchell Bregman for the contribution!
- An issue was detected when using multiple projections. We solved it and added a test to cover that use case. (#349)
- A CONTRIBUTING.md file was added to the repo. Please follow the instructions prior contributing.
- ClickHouse projections are now fully supported for
table
materialization, and partly supported fordistributed_table
materialization. The projection config should be added to the model config (#342), for instance:{{ config( materialized='%s', projections=[ { 'name': 'your_projection_name', 'query': 'your_projection_query' } ] ) }}
- Until this release, when writing tests, it was not possible to pass empty seed data. (#341)
- When a cluster was used, the adapter left a few
__dbt_backup
tables in the schema. After the fix, these backup tables are now properly dropped. (#326) - Due to this GitHub change, we needed to adjust the
docker compose
command in our tests. (#334)
- Added a new experimental incremental strategy,
insert_overwrite
, which replaces existing data in a target table partition by partition. This ensures that only the specified partitions are overwritten, helping to maintain performance and data consistency. Note that this feature has not been tested with distributed models and may not work with such materializations. Anton Bryzgalov Thank you for your contribution! (#201) - Schema changes for incremental models were improved and now include a
sync_all_column
option. Can Bekleyici huge thanks for your contribution. (#332) - Previously, view settings were only applied during view creation, not when querying the view. Starting with this release, the settings attribute will now also be applied when running queries on a view. (#324)
- Enhanced the clickhouse__listagg macro to support single field ordering with optional direction, ensuring compatibility with ClickHouse's sorting limitations. Added validation to prevent the use of multiple order-by fields. (#318)
- Update the docs with the new
insert_overwrite
incremental strategy. (#331) - Add documentation for codec column configuration. (#317)
- Refresh materialized_view table only if
--full-refresh
is specified. - Fix temporary table creation to support dbt unit tests.
- ClickHouse dictionaries are now correctly created "on cluster" when a cluster is defined.
- Added database prefix option for local tables of distributed tables (until this change, only a suffix was supported).
- You can now customize tcp_keepalive configuration for native connections.
- Implement listagg (groupArray) macro.
- Upgrade the connector to use dbt-core 1.8.0. More info about this upgrade can be found here.
Beginning in v1.8, dbt-core and adapters are decoupled. Going forward, your installations should explicitly include both dbt-core and the desired adapter. The new pip installation command should look like this:
pip install dbt-core dbt-clickhouse
- Fix bool_or behavior (a cross-database dbt macro ).
- Fix query_settings for models with contracts.
- Fix the option to use Nullable and LowCardinality in column constraints.
- Specifying an empty or null value for a connection_overrides field produces invalid DDL for the dictionary materialization. A fix was introduced in this release.
- The connector is now supporting column codecs.
- A bug in (experimental) Distributed Table model creation could lead to errors when there was a change in the model definition (see, e.g., #226). Thanks to Thomas Schmidt for the Fix!
- A comment at the end of a model would break the query used to retrieve the result column datatypes. Thanks to triou for the bug report and the fix. Closes #256
- The new materialization for ClickHouse dictionaries now takes an optional "credentials dictionary" argument that overrides the global credentials values for user, password, database, host, and port (including removing any of those values by adding empty values if not needed). This allows better control over creating dictionaries on different server. Thanks to Cristhian Garcia for the PR!
- A new
ttl
setting has been added to model configuration that will insert the provided ClickHouse TTL expression in the appropriate place. Thanks to Evan Rusackas for the contribution! - The Agate library should now be lazy loaded. This should modestly improve dbt startup times (after dbt-clickhouse is upgraded to dbt 1.8.x). Thanks to Daniel Reeves for PR.
- Requirements and tests upgraded to include Python 3.12. Closes #264
- Model settings were not working correctly for custom materializations. Thanks to original dbt-clickhouse silentsokolov for the PR!
- Adds support for materializing ClickHouse dictionaries. Thanks to Rory Sawyer for the contribution! See his excellent tests for example usage.
- Fixed an issue where passing settings to on view creation didn't work.
- The
dbt test
command with a LIMIT clause were broken due to parsing error when having settings in the query (issue). We added a dedicated limit placer, that takes into account the settings section (using a comment flag-- end_of_sql
within the query).
- Fixed an issue where Materialize Views would break with a custom schema. Thanks to Rory Sawyer for the PR!
- Some models with LIMIT clauses were broken in recent releases. This has been fixed. Thanks to ptemarvelde for the PR!
- It was possible for incremental models with the delete+insert strategy to fail if ClickHouse "light weight deletes" were
not enabled or the required setting
allow_nondetermistic_mutations
was not enabled and the user did not have permission to apply it. This condition is now detected on startup, and an exception will be thrown ifuse_lw_deletes
is configured in the profile. Otherwise, a warning will be logged that incremental models will be slower (because such models will be downgraded to use thelegacy
incremental strategy). This should prevent the confusing behavior in #197 by throwing an early exception for an unsupported configuration.
- Minimal compatibility with dbt 1.7.x. The date_spine macro and additional automated tests have not been implemented, but are planned for a future patch release.
- DBT 1.7 introduces a (complex) optimization mechanism for retrieving a dbt catalog which is overkill for ClickHouse (which has no separate schema/database level), so this release includes some internal catalog changes to simplify that process.
- The dbt
on_schema_change
configuration value for incremental models was effectively being ignored. This has been fixed with a very limited implementation. Closes #199. Because of the way that ORDER BY/SORT BY/PARTITION BY/PRIMARY KEYS work in ClickHouse, plus the complexities of correctly transforming ClickHouse data types,sync_all_columns
is not currently supported (although an implementation that works for non-key columns is theoretically possible, such an enhancement is not currently planned). Accordingly, onlyignore
,fail
, andappend_new_columns
values are supported foron_schema_change
. It is also not currently supported for Distributed tables.
Note that actually appending new columns requires a fallback to the legacy
incremental strategy, which is quite inefficient,
so while theoretically possible, using append_new_columns
is not recommended except for very small data volumes.
- Identifier quoting was disabled for tables/databases etc. This would cause failures for schemas or tables using reserved words or containing special characters. This has been fixed and some macros have been updated to correctly handle such identifiers. Note that there still may be untested edge cases where nonstandard identifiers cause issues, so they are still not recommended. Closes #144. Thanks to Alexandru Pisarenco for the report and initial PR!
- The new
allow_automatic_deduplication
setting was not being correctly propagated to the adapter, so setting it toTrue
did not have the intended affect. In addition, this setting is now ignored for older ClickHouse versions that do not supportCREATE TABLE AS SELECT ... EMPTY
, since the automatic deduplication window is required to allow correct inserts in Replicated tables on those older versions. Fixes #216.
- Compatible with dbt 1.6.x. Note that dbt new
clone
feature is not supported, as ClickHouse has no native "light weight" clone functionality, and copying tables without actual data transfer is not possible in ClickHouse (barring file manipulation outside ClickHouse itself). - A new ClickHouse specific Materialized View materialization contributed by Rory Sawyer.
This creates a ClickHouse Materialized view using the
TO
form with the name<model_name>_mv
and the associated target table<model_name>
. It's highly recommended to fully understand how ClickHouse materialized views work before using this materialization.
- The
ON CLUSTER
clause was in the incorrect place for legacy incremental materializations. This has been fixed. Thanks to Steven Reitsma for the fix! - The
ON CLUSTER
DDL for drop tables did not include a SYNC modifier, which might be the cause of some "table already exists" errors. TheSYNC
modifier has been added to theon_cluster
macro when dropping relations. - Fixed a bug where using table settings such as
allow_nullable_key
would break "legacy" incremental materializations. Closes #209. Also see the new modelconfig
propertyinsert_settings
described below. - Fixed an issue where incremental materializations would incorrectly exclude duplicated inserted elements due to "automatic"
ClickHouse deduplication on replicated tables. Closes #213. The fix consists
of always sending a
replicated_deduplication_window=0
table setting when creating the incremental relations. This behavior can be overridden by setting the new profile parameterallow_automatic_deduplication
toTrue
, although for general dbt operations this is probably not necessary and not recommended. Finally thanks to Andy(https://github.com/andy-miracl) for the report and debugging help!
- Added a new profile property
allow_automatic_deduplication
, which defaults toFalse
. ClickHouse Replicated deduplication is now disable for incremental inserts, but this property can be set to true if for some reason the default ClickHouse behavior for inserted blocks is desired. - Added a new model
config
propertyquery_settings
for any ClickHouse settings that should be sent with theINSERT INTO
orDELETE_FROM
queries used with materializations. Note this is distinct from the existing propertysettings
which is used for ClickHouse "table" settings in DDL statements likeCREATE TABLE ... AS
.
- Fix table materialization for compatibility with SQLFluff. Thanks to Kristof Szaloki for the PR!
- Compatible with dbt 1.5.x
- Contract support (using exact column data types)
- Fix s3 macro when bucket includes
https://
prefix. Closes #192.
- Lots of work on Distributed table materializations. Big thanks to gfunc for the additional PR and Zhenbang for code review and suggestions. See the README for details on how to use the new functionality.
- dbt would fail if a cluster name contained a dash. This has been fixed. Thanks to [Andy](https://github.com/the4thamigo-uk for the PR
- Fixed issues with experimental Distributed table materializations. Closes #179. Thanks to Zhebnang for the report and for contributing to the fix with gfunc.
- Fixed an exception in "legacy" incremental materializations that are not distributed
- Lightweight deletes could fail in environments where the HTTP session was not preserved (such as clusters behind a non-sticky load balancer). This has been fixed by sending the required settings with every request instead of relying on a SET statement. A similar approach has been used to persist the 'insert_distributed_sync' setting for Distributed table materializations.
- Adds additional experimental support for Distributed table engine models and incremental materialization. See the README for details. Thanks to gladkikhtutu for the contribution!
- Fixed two logging/exception handling issues that would cause exception on startup or when handling some exceptions from the ClickHouse server. Partially addresses #169.
- Fixed issue with the
on_cluster
macro that would break the exchange tables step of incremental materializations with an active cluster. Thanks to Andrew Davis for the PR. Closes #167
- Use correct return value for
execute
. This would cause an exception when running hooks. Thanks to Sergey Reshetnikov for the PR. Closed #161
- Added macros for creating distributed tables. See the
distributed_table.sql
include file. Thanks to gladkikhtutu for the contribution.
- Create initial dbt database (if not found) on the defined cluster on first run, instead of just the execution node. Thanks to Jens Hoevenaars for the PR
- Fix the SYSTEM SYNC REPLICA statement when exchanging tables ON CLUSTER for incremental materializations. Thanks to Schum for PR. Closed #157.
- Reduce the number of SQL calls for Modify Comment operations. Thanks to Konstantin Ilchenko.
- Add "on cluster" to several additional macros to better support distributed tables. Thanks to Saurabh Bikram
- Add the unique invocation id to temporary "new data" used in
delete+insert
incremental materializations to allow parallel transformations on the same table. In general parallel transformations are risky, so this approach should only be used when transformations are explicitly limited to non-overlapping data ranges. Closes #150
- Support dbt [1.4.1] #135
- Adds support for Python 3.11
- Adds additional dbt 1.4.0 tests
- Adds support for incremental_predicates. This only applies to
delete+insert
incremental strategy. Note that incremental predicates that depend on "non-deterministic" data (such as a subquery using a table that is accepting inserts) could lead to unexpected results for ReplicatedMergeTree tables depending on the timing of the incremental materialization. - Replaces deprecated Exception classes
- Setting the
use_lw_deletes
profile value to True will now attempt to enable theallow_experimental_lightweight_delete
setting for the dbt session (if user has such permissions on the ClickHouse server). See #133
- Composite unique keys specified as a list would not work with incremental materializations. This has been fixed.
Release 1.3.3, 2023-01-18
- The documentation has been updated to reflect that dbt-clickhouse does support ephemeral models, and ephemeral model tests do pass. However, due to a ClickHouse limitation, CTEs will not work directly with INSERT statements so table models will fail if they include ephemeral models in the SELECT. View models and other SQL statements using ephemeral models should work correctly.
- Client connections would incorrectly reuse a previous session_id when initializing, causing session locked errors. This has been fixed. This
closes #127. Multiple threads should now work correctly in dbt projects,
and dbt-clickhouse automated tests now use
threads=4
(soon to be the dbt default).
Release 1.3.2, 2022-12-23
- Added experimental support for the
delete+insert
incremental strategy. In most cases this strategy should be significantly faster than the existing "legacy" custom strategy (which is still the default). To enabledelete+insert
incremental materialization, the flagallow_experimental_lightweight_delete
must be enabled on your ClickHouse server. This flag is NOT currently considered production ready, so use at your own risk. To use this strategy as the "default" incremental strategy, the new configuration valueuse_lw_deletes
must be set to True. Otherwise, to use it for a particular model, set the model'sincremental_strategy
todelete+insert
. Important caveats about this strategy:- This strategy directly uses lightweight deletes on the target model/table. It does not create a temporary or intermediate table. Accordingly, if there is an issue during that transformation, the materialization can not be rolled back and the model may contain inconsistent data.
- If the incremental materialization includes schema changes, or lightweight deletes are not available, dbt-clickhouse will fall back to the much slower "legacy" strategy.
- Allow
append
as an incremental strategy. This is the same as using the custom model configuration valueinserts_only
. - New s3source macro. This simplifies the process of using the ClickHouse s3 table function in queries. See the tests for example usage.
- The ON CLUSTER clause has been added to additional DDL statements including incremental models processing.
Closes #117 and should close #95
for Replicated tables that use the
{uuid}
macro in the path to avoid name conflicts. Thanks to Saurabh Bikram - The
apply
andrevoke
grants macros now correctly work with roles as well as user. Again thanks to Saurabh Bikram - A compound unique_key (such as
key1, key2
) now works correctly with incremental models
Release 1.3.1, 2022-11-17
- Improve error message when atomic "EXCHANGE TABLES" operation is not supported
Release 1.3.0, 2022-10-30
- Support dbt 1.3.0 #105
- Adds additional dbt 1.3.0 core tests
- Adds array utility macros ported from dbt-utils
- Does NOT add support for Python models (not implemented)
- Does NOT utilize default/standard incremental materialization macros (standard strategies do not work in ClickHouse)
- Require exact match for relations. ClickHouse databases and tables are all case-sensitive, so all searches are now case-sensitive. Closes #100 and #110
Release 1.2.1, 2022-09-19
- Support dbt 1.2.1 #79
- Grants support on models
- Support the cross database dbt-core macros (migrated from dbt-utils)
- Add docs tests
- Validate Python 3.10 support
- Implement retry logic for small set of "retryable" connection errors
- Don't close connection on release to improvement model performance
- Allow specifying database engine for schema creation
- Extend support for ClickHouse Cloud and Replicated databases
- Validate that atomic EXCHANGE TABLES is supported on the target ClickHouse server #94
Release 1.1.8, 2022-09-02
- Support use of atomic EXCHANGE TABLES when supported by ClickHouse
Release 1.1.7, 2022-07-11
- Support ClickHouse Arrays with Nullable value types
- Fix brackets regex in columns)
Release 1.1.6, 2022-07-04
- Add support for CREATE AS SELECT in Replicated Databases
Release 1.1.5, 2022-06-27
- Ensure database exists before connection
- Safely handle dropping working database
Release 1.1.4, 2022-06-27
- Allow selection of either native protocol driver (clickhouse-driver) or http driver (clickhouse-connect)
Release 1.1.2, 2022-06-22
Release 1.1.0.2, 2022-06-17
- Support for inserting SETTINGS section to CREATE AS and INSERT INTO queries through model configuration
- Support adding custom session settings to the connection through profile credentials.
- Allow using custom databases/schemas for connection
- Support Boolean fields in CSV seeds
- Fixed prehooks and empty fields in CSV seeds
Release 1.1.0, 2022-06-02
Release 1.1.0.1, 2022-06-09
- Fixed prehooks and empty fields in CSV seeds
Release 1.1.0, 2022-06-02
- 1.1.0 dbt support
- Snapshots timestamps
- Replace temporary in-memory tables with MergeTree table - removing memory limitations over Incremental model and snapshots creation
- Moved to use clickhouse-connect driver - an officially supported HTTP driver by ClickHouse Inc.
Release 1.0.4, 2022-04-02
- 1.0.4 dbt support
- New logger
Release 1.0.1 - 2022-02-09
- Support 1.0.1 dbt
- Skip the order columns if the engine is Distributed #14
- Fix missing optional "as" #32
- Fix cluster name quoted #31
Release 1.0.0, 2022-01-02
- dbt 1.0.0 support
Release 0.21.1, 2022-01-01
- dbt 0.21.1 support
- Extended settings for clickhouse-driver #27
- Fix types in CSV seed #24
Release 0.21.0, 2021-11-18
- Support 0.21.0 dbt
Release 0.20.2, 2021-10-16
- dbt 0.20.1 support
- Rewrite logic incremental materializations #12
Release 0.20.1, 2021-08-15
- dbt 0.20.1 support
Release 0.20.0, 2021-08-14
- dbt 0.20.0 support
Release 0.19.1.1, 2021-08-13
- Add verify and secure to connection configuration
- Fix the delete expression #12
Release 0.19.1, 2021-05-07
- Add support the
ON CLUSTER
clause for main cases
- Engine now require brackets
()
- Fix a missing sample profile
Release 0.19.0.2, 2021-04-03
- Fix name of partition
Release 0.19.0.1, 2021-03-30
- Fix version of clickhouse-driver in setup.py