Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse Operator leaves orphan S3 files when scaling down replicas that use S3-backed MergeTree #1388

Closed
hodgesrm opened this issue Apr 7, 2024 · 4 comments

Comments

@hodgesrm
Copy link
Member

hodgesrm commented Apr 7, 2024

When scaling down replicas clickhouse-operator does not ensure that S3 files that back MergeTree tables are fully deleted. This results in orphan files in the S3 bucket. This behavior was tested using ClickHouse 24.3.2.3 and clickhouse-operator 0.23.3.

Here's how to reproduce in general, followed by a detailed scenario.

  1. Create a ClickHouse cluster with two replicas (replicaCount=2) with a storage policy that allows data to be stored on S3.
  2. Run DDL to create a replicated table that uses S3 storage.
  3. Add data to the table.
  4. Confirm that data is stored in S3.
  5. Change the replicaCount to 1 and update the CHI resource definition.
  6. Drop the replicated table on the remaining replica.
  7. Check data in the S3 bucket. You will see orphan files.

To reproduce in detail use the examples in https://github.com/Altinity/clickhouse-sql-examples/tree/main/using-s3-and-clickhouse. Here is a detailed script.

# Grab sample code. 
git clone https://github.com/Altinity/clickhouse-sql-examples
cd clickhouse-sql-examples/using-s3-and-clickhouse
# Generate S3 credentials in a secret. (See script header for instructions.)
./generate-s3-secret.sh
# Create the cluster. 
kubectl apply -f demo2-s3-01.yaml
# Wait for both pods to come up, then run the following commands. 
./port-forward-2.sh
alias cc-batch='clickhouse-client -m -n --verbose -t --echo -f Pretty'
cc-batch < sql-11-create-s3-tables.sql
cc-batch < sql-12-insert-data.sql
cc-batch < sql-03-statistics.sql
# Check the data in S3 using a command like the following. Note the number of objects. 
# Run this command until the number of S3 files stops growing. The sample inserts via a distributed table. 
# In my sample runs I get 3392 file and 4.3 GiB data stored in S3. 
aws s3 ls --recursive --human-readable --summarize s3://<bucket>/clickhouse/mergetree/
# Scale down the replicaCount from 2 to 1 and apply. 
kubectl edit chi demo2
# Check the data in S3 again. It should not have changed.  
aws s3 ls --recursive --human-readable --summarize s3://<bucket>/clickhouse/mergetree/

You can now prove that S3 files are orphaned and see which ones they are. One way is as follows.

  1. On the remaining ClickHouse server run truncate table test_s3_direct_local;.
  2. Check the S3 files. About half of them remain. In my sample runs there were 1707 files and 2.1 GiB of data remaining.
@hodgesrm
Copy link
Member Author

hodgesrm commented Apr 7, 2024

It appears that one workaround for this problem is to drop tables explicitly before decommissioning the replica. For example, you can login to the departing replica and issue the following command:

DROP TABLE test_s3_direct_local SYNC

It's unclear whether SYNC helps fully because it's not documented in the official docs but the Altinity KB indicates that it drops table data synchronously. Anyway, when I run this command before scaling down the S3 files are properly removed.

@hodgesrm
Copy link
Member Author

hodgesrm commented Apr 7, 2024

Final notes:

  1. The reproduction described above did not use zero-copy replication.
  2. This issue also extends to ordinary MergeTree files. It appears the operator only deletes ReplicatedMergeTree tables, replicated databases, views, or dictionaries. See https://github.com/Altinity/clickhouse-operator/blob/master/pkg/model/chi/schemer/sql.go#L31 for details.

@alex-zaitsev
Copy link
Member

Fixed in 0.24.0

@alex-zaitsev
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants