Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserts failing silently #458

Open
jonscyr opened this issue Oct 12, 2024 · 4 comments · Fixed by #461
Open

Inserts failing silently #458

jonscyr opened this issue Oct 12, 2024 · 4 comments · Fixed by #461
Labels
bug Something isn't working

Comments

@jonscyr
Copy link

jonscyr commented Oct 12, 2024

Describe the bug

Missing events in the target table. 3 events at offsets 7407350 - 7407352 are not present in clickhouse table. The connector has attempted to write them, we could see it from the DEBUG logs, but the insert had actually failed. Found insert query in query_log for id "eed49adf-feb9-46a7-abec-e44d9bdc03c2". it failed with MEMORY_LIMIT_EXCEEDED. Shouldn't this have made the sink connector retried or put the events in a DLQ like it was configured to? Instead, I could see this in the connector's logs

|task-0] Response Summary - Written Bytes: [7854], Written Rows: [3] - (QueryId: [eed49adf-feb9-46a7-abec-e44d9bdc03c2]) 

Steps to reproduce

Not sure how to reproduce

Expected behaviour

Failed inserts should have raised exception and caused it to be retried / put in dlq as configured.

Available logs

Configuration

https://gist.github.com/jonscyr/ef2f400a30a6b63a019d77b8a77f23b4

Environment

We have half-stack setup (CH cloud + strimzi kafka). We've been facing this issue where some batches events gets lost every 1 or two months. We have a script for validator running every day which raises this.
Our kafka-connect are running on LOG_LEVEL=DEBUG.

ClickHouse server

@jonscyr jonscyr added the bug Something isn't working label Oct 12, 2024
@jonscyr jonscyr changed the title Losing events around once a month Inserts failing silently Oct 16, 2024
@jonscyr
Copy link
Author

jonscyr commented Oct 24, 2024

Update: Talking with CH cloud team, they were able to reproduce this. It's because of the send_progress_in_http_headers flag. http response code is 200 even for an out of memory error. @Paultagoras

https://github.com/ClickHouse/clickhouse-kafka-connect/pull/275/files#diff-05b2bb95bf4463df8acbff731837dc199b3c100bd6d498752d82291ea3c8e0dfR177

@Paultagoras
Copy link
Contributor

@jonscyr We've released a "pre-release" version of the connector (https://github.com/ClickHouse/clickhouse-kafka-connect/releases/tag/v1.2.4) that updates the underlying language client to address this, so that folks can see if this resolves their issue (it should, but a wider test is better) - the "official" release should happen next week.

@Paultagoras Paultagoras linked a pull request Oct 25, 2024 that will close this issue
@jonscyr
Copy link
Author

jonscyr commented Oct 28, 2024

thank you @Paultagoras . we've upgraded to 1.2.4. will keep this thread posted.

@Paultagoras
Copy link
Contributor

thank you @Paultagoras . we've upgraded to 1.2.4. will keep this thread posted.

Hi @jonscyr any word on how it's going?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants