Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry for batch API #941

Merged
merged 7 commits into from
Jul 20, 2023
Merged

Retry for batch API #941

merged 7 commits into from
Jul 20, 2023

Conversation

djosephsen
Copy link
Contributor

Intent of this PR is to propose the addition of a Retry() method to the client-facing Batch interface. Today, in the event of a Batch.Send() error, the caller is left with an interface type they can neither apply a new conn to, nor derive their original rows from.

This pr extends the interface with a Retry() method, that re-calls acquire() on the underlying batch to derive a new conn (possibly to an alternate cluster member). Because of the way the connect-derived interfaces are intertwined, this isn't as clean as I would like it to be, perhaps there's a better way. In either case I'd like to have the conversation.

Thanks!

@CLAassistant
Copy link

CLAassistant commented Mar 23, 2023

CLA assistant check
All committers have signed the CLA.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 24, 2023

@djosephsen, thanks for your contribution.

It's a good improvement. However, I think it would be more robust if we allowed to retry using Send function. WDYT?

@djosephsen
Copy link
Contributor Author

Sounds great. My read on Send() is that it defers setting it's internal sent bool until it succeeds, so you should already be able to call Send() again after a failure today. How were you thinking of implementing retry in Send? Maybe extending the type with some kind of state like a counter? And in the case sent == false && attempts >0 we call acquire() first?

@jkaflik
Copy link
Contributor

jkaflik commented Mar 27, 2023

@djosephsen I will have a look at this tomorrow. Perhaps you have an idea of how to implement it?

@djosephsen
Copy link
Contributor Author

Oh sure, I just thought maybe you had an implementation in mind. I'll rewire this into Send() instead and see what it looks like.

@ghost
Copy link

ghost commented Jun 9, 2023

When this PR will be merged? I'm pretty sure that any flow need to be ably to retry sending. I guess it should be inside 'Send', not a new method 'retry' but anyway it would be nice to have any solution to retry

@djosephsen
Copy link
Contributor Author

When this PR will be merged? I'm pretty sure that any flow need to be ably to retry sending. I guess it should be inside 'Send', not a new method 'retry' but anyway it would be nice to have any solution to retry

Hey, I'm sorry, this fell off my radar because we've been experimenting with kafka-engine at $currjob. I'll make an effort to get this cleaned up and ready for review this week. Thanks for the reminder

@djosephsen
Copy link
Contributor Author

So, going over this, it struck me that the simplest solution is to just un-export Retry as previously proposed, and call it internally from Send() whenever the batch is marked as already sent. Some caveats:

  • It is now a valid operation to call Send() on an already sent batch, which is, I think, intuitive. The end-user calls batch.Send() and upon receiving an error, simply calls batch.Send() again, and the client will seamlessly re-acquire a new conn under the hood.
  • this puts the onus of tracking retry attempts, backoff and etc.. on the user
  • since the unexported retry() essentially wraps Send(), the release() function should work as expected, and this won't leak conns
  • If the client calls b.Send() on an already successfully sent batch, it will receive a code: 101, message: Unexpected packet Data received from client error from the server, which will detect that the batch is a duplicate send. Not sure if we want to add guardrails for this case inside retry() or not
  • test 798 was modified to reflect that multiple Send() calls is valid except in the case of a re-send of an already successfully-sent batch.

@djosephsen
Copy link
Contributor Author

Sorry but I think I might need help clearing the integration tests.. running
CLICKHOUSE_DIAL_TIMEOUT=20 CLICKHOUSE_TEST_TIMEOUT=600s CLICKHOUSE_QUORUM_INSERT=3 make test on my workstation passes fine.

...
== RUN   TestStdNullableUUID
=== RUN   TestStdNullableUUID/Native_Protocol
=== RUN   TestStdNullableUUID/Http_Protocol
--- PASS: TestStdNullableUUID (0.02s)
    --- PASS: TestStdNullableUUID/Native_Protocol (0.01s)
    --- PASS: TestStdNullableUUID/Http_Protocol (0.01s)
PASS
ok  	github.com/ClickHouse/clickhouse-go/v2/tests/std	20.509s

@AndreiCrimezniuc
Copy link

Guys! It needs one little step to pass through tests. Maybe just try to run them one more time? Some connections refused to cloud.

@jkaflik
Copy link
Contributor

jkaflik commented Jul 10, 2023

@AndreiCrimezniuc all good Andrei. I will follow up on this PR soon.

@jkaflik jkaflik self-assigned this Jul 10, 2023
@jkaflik jkaflik changed the title implement Batch.Retry() Retry for batch API Jul 20, 2023
@jkaflik jkaflik merged commit 8d6339b into ClickHouse:main Jul 20, 2023
jkaflik added a commit that referenced this pull request Jul 21, 2023
@djosephsen djosephsen deleted the batchretry branch September 14, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants