fix: incremental on cluster clause #166

Savid · 2023-07-12T05:26:18Z

Missing ON CLUSTER clause when creating incremental temporary replicated table.

Error;

Code: 36. DB::Exception: Macro 'uuid' and empty arguments of ReplicatedMergeTree are supported only for ON CLUSTER queries with Atomic database engine. (BAD_ARGUMENTS) (version 23.6.2.18 (official build))

Debug sql

create table dbt.example__dbt_tmp as dbt.example__dbt_new_data

Example config;

{{ config(
    materialized="incremental",
    engine="ReplicatedMergeTree('/clickhouse/{installation}/{cluster}/tables/{shard}/{database}/{table}/{uuid}', '{replica}')",
    ...
) }}

What this fix does

Changes

create table dbt.example__dbt_tmp as dbt.example__dbt_new_data

to

create table dbt.example__dbt_tmp ON CLUSTER '{cluster}' as dbt.example__dbt_new_data

If cluster has been set in config.

CLAassistant · 2023-07-12T05:47:08Z

All committers have signed the CLA.

genzgd · 2023-07-12T12:33:31Z

What is the point of replicating a temporary table? I think we would have to update all of the temporary table operations (including the cleanup) as ON CLUSTER and the behavior would have to be different with ReplicatedMergeTrees versus a regular MergeTrees.

Savid · 2023-07-12T23:11:38Z

What is the point of replicating a temporary table? I think we would have to update all of the temporary table operations (including the cleanup) as ON CLUSTER and the behavior would have to be different with ReplicatedMergeTrees versus a regular MergeTrees.

I might be misunderstanding the process but that table is exchanged and should be replicated?

create example__dbt_new_data (using the model config engine and ON CLUSTER clause)
insert into example__dbt_new_data from model (with incremental filters)
create example__dbt_tmp as example__dbt_new_data (this currently fails as its missing the ON CLUSTER clause)
insert into example__dbt_tmp from example excluding rows already in example__dbt_new_data
insert into example__dbt_tmp from example__dbt_new_data
drop example__dbt_new_data and example__dbt_backup if exists
rename example__dbt_tmp to example__dbt_backup
exchange example__dbt_backup to example

genzgd · 2023-07-18T14:57:23Z

Sorry about that, you're correct, this particular table is not actually "temporary".

However, all of the other operations involved are not ON CLUSTER, so this code really can only work with a ReplicatedMergeTree engine. Have you confirmed that incremental materializations work correctly ON CLUSTER with a ReplicatedMergeTree table? Maybe we should add a check that the the model uses a Replicated Engine (not sure how that would work). At a minimum we should probably explicitly document that incremental materializations do not work if a cluster is defined and the underlying model is not a *ReplicatedMergeTree table.

I think I'm okay with merging this if you've validated that the whole incremental materialization process works for ReplicatedMergeTrees (although automated tests would be really nice) and we have at least documented that ON CLUSTER must be combined with *ReplicatedMergeTrees for incremental materialization.

(Finally just as an FYI it looks like @gladkikhtutu is working on the related problem of incremental materialization of Distributed Tables - which also will involve ON CLUSTER operations. #163)

gladkikhtutu · 2023-07-18T15:18:28Z

I can add that we already use on cluster clause for intermediate table, for example here due to this statement.
And yes, I will rewrite and reuse this place in distributed incremental materialization, I need to have on cluster clause and distributed table.

Only I would suggest using the same method everywhere ({{ on_cluster_clause()}})

Savid · 2023-07-18T21:37:29Z

I'm happy to close this and use my own fork for now. I can wait for distributed incremental materialization changes.

Thanks all!

fix: incremental on cluster clause

989f39a

Savid closed this Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: incremental on cluster clause #166

fix: incremental on cluster clause #166

Savid commented Jul 12, 2023

CLAassistant commented Jul 12, 2023 •

edited

Loading

genzgd commented Jul 12, 2023

Savid commented Jul 12, 2023 •

edited

Loading

genzgd commented Jul 18, 2023

gladkikhtutu commented Jul 18, 2023 •

edited

Loading

Savid commented Jul 18, 2023

fix: incremental on cluster clause #166

fix: incremental on cluster clause #166

Conversation

Savid commented Jul 12, 2023

What this fix does

CLAassistant commented Jul 12, 2023 • edited Loading

genzgd commented Jul 12, 2023

Savid commented Jul 12, 2023 • edited Loading

genzgd commented Jul 18, 2023

gladkikhtutu commented Jul 18, 2023 • edited Loading

Savid commented Jul 18, 2023

CLAassistant commented Jul 12, 2023 •

edited

Loading

Savid commented Jul 12, 2023 •

edited

Loading

gladkikhtutu commented Jul 18, 2023 •

edited

Loading