Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: incremental on cluster clause #166

Closed
wants to merge 1 commit into from

Conversation

Savid
Copy link

@Savid Savid commented Jul 12, 2023

Missing ON CLUSTER clause when creating incremental temporary replicated table.

Error;

Code: 36. DB::Exception: Macro 'uuid' and empty arguments of ReplicatedMergeTree are supported only for ON CLUSTER queries with Atomic database engine. (BAD_ARGUMENTS) (version 23.6.2.18 (official build))

Debug sql

create table dbt.example__dbt_tmp as dbt.example__dbt_new_data

Example config;

{{ config(
    materialized="incremental",
    engine="ReplicatedMergeTree('/clickhouse/{installation}/{cluster}/tables/{shard}/{database}/{table}/{uuid}', '{replica}')",
    ...
) }}

What this fix does

Changes

create table dbt.example__dbt_tmp as dbt.example__dbt_new_data

to

create table dbt.example__dbt_tmp ON CLUSTER '{cluster}' as dbt.example__dbt_new_data

If cluster has been set in config.

@CLAassistant
Copy link

CLAassistant commented Jul 12, 2023

CLA assistant check
All committers have signed the CLA.

@genzgd
Copy link
Contributor

genzgd commented Jul 12, 2023

What is the point of replicating a temporary table? I think we would have to update all of the temporary table operations (including the cleanup) as ON CLUSTER and the behavior would have to be different with ReplicatedMergeTrees versus a regular MergeTrees.

@Savid
Copy link
Author

Savid commented Jul 12, 2023

What is the point of replicating a temporary table? I think we would have to update all of the temporary table operations (including the cleanup) as ON CLUSTER and the behavior would have to be different with ReplicatedMergeTrees versus a regular MergeTrees.

I might be misunderstanding the process but that table is exchanged and should be replicated?

  1. create example__dbt_new_data (using the model config engine and ON CLUSTER clause)
  2. insert into example__dbt_new_data from model (with incremental filters)
  3. create example__dbt_tmp as example__dbt_new_data (this currently fails as its missing the ON CLUSTER clause)
  4. insert into example__dbt_tmp from example excluding rows already in example__dbt_new_data
  5. insert into example__dbt_tmp from example__dbt_new_data
  6. drop example__dbt_new_data and example__dbt_backup if exists
  7. rename example__dbt_tmp to example__dbt_backup
  8. exchange example__dbt_backup to example

@genzgd
Copy link
Contributor

genzgd commented Jul 18, 2023

Sorry about that, you're correct, this particular table is not actually "temporary".

However, all of the other operations involved are not ON CLUSTER, so this code really can only work with a ReplicatedMergeTree engine. Have you confirmed that incremental materializations work correctly ON CLUSTER with a ReplicatedMergeTree table? Maybe we should add a check that the the model uses a Replicated Engine (not sure how that would work). At a minimum we should probably explicitly document that incremental materializations do not work if a cluster is defined and the underlying model is not a *ReplicatedMergeTree table.

I think I'm okay with merging this if you've validated that the whole incremental materialization process works for ReplicatedMergeTrees (although automated tests would be really nice) and we have at least documented that ON CLUSTER must be combined with *ReplicatedMergeTrees for incremental materialization.

(Finally just as an FYI it looks like @gladkikhtutu is working on the related problem of incremental materialization of Distributed Tables - which also will involve ON CLUSTER operations. #163)

@gladkikhtutu
Copy link
Contributor

gladkikhtutu commented Jul 18, 2023

I can add that we already use on cluster clause for intermediate table, for example here due to this statement.
And yes, I will rewrite and reuse this place in distributed incremental materialization, I need to have on cluster clause and distributed table.

Only I would suggest using the same method everywhere ({{ on_cluster_clause()}})

@Savid
Copy link
Author

Savid commented Jul 18, 2023

I'm happy to close this and use my own fork for now. I can wait for distributed incremental materialization changes.

Thanks all!

@Savid Savid closed this Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants