Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrying on connection closed #104

Open
jbransen opened this issue Jan 6, 2025 · 6 comments
Open

Retrying on connection closed #104

jbransen opened this issue Jan 6, 2025 · 6 comments

Comments

@jbransen
Copy link

jbransen commented Jan 6, 2025

I am running this library in a production setting, and on a regular basis I observe the following error:

RpcError(Status { code: Cancelled, message: "operation was canceled", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) })

When reading the BigTable connection pool documentation this seems to be expected, as connections are refreshed every hour. However, as a client user I am suprised: why does this error make it's way to the end user? Is that a concious choice or is this an oversight? I would expect this library to gracefully handle this by retrying (which should always work fine), so that clients never end up with such errors.

@liufuyang
Copy link
Owner

liufuyang commented Jan 6, 2025

Hey @jbransen, thanks for raising this issue. You might be the first person that uses this crate in a production environment. I created this repo/crate while I worked a Spotify - at that moment I was thinking that perhaps it would be handy in case we wanted to try using Rust in some of our backend systems, as many of the backend systems needed to talk to Bigtable (however, that never really happened before I left Spotify). And I do remember during those days, the Spotify engineers needed to add Bigtable channel refreshing (client-swapping) manually in their code base to deal with the hourly spikes, and perhaps these days those "auto-refresh" feature gets merged into the standard Bigtable Java client side (as it described from the doc you linked); For the Golang side, the doc you linked points to some simple Golang example code that does the same thing, but requires users to implement those as well, I suppose.

So basically, without looking into all the details, if one wishes to have similar features (client refreshing), one might just need to implement it in a similar fashion in Rust. On your side, you might make it work by manually creating a new BigTableConnection at some period and replace the old one, periodically. Or, as you said, this could be done by BigTableConnection internally and providing some settings/flags to turn on and configure this feature.

Unfortunately, I do not use this crate in my work, nor work with Bigtable anymore. So even if I try to implement a feature like this, it might be a bit hard for me to try and test it out (not sure if the Bigtable emulator can do connection close, perhaps possible?).

But if you want, you are very welcome to contribute to this crate, add such feature, and test it out in your environment before merging.

Let me know if this helps :)

@jbransen
Copy link
Author

jbransen commented Jan 6, 2025

Thank you for the fast reply, this helps. I am happy to craft a solution, try it out, and make a PR.

@liufuyang
Copy link
Owner

Oh cool, thanks a lot. I haven't been following up with the Rust GRPC development closely, not sure if on the tonic and tokio side whether there is some implementation off the shelve can help in this case. You might want to check the API there a bit or perhaps ask in the tonic Discord channel, people are helpful and I got a lot of good help while I was implementing this crate.

@liufuyang
Copy link
Owner

liufuyang commented Jan 7, 2025

Thanks. And in this case, I am not 100% sure the "retry-if-encounter-channel-close" is really what we need and can solve your issue gracefully. As from the Google doc linked, and with some public info such as this stackoverflow page, what I know is that the operation of reconnection to grpc to open warm channels are not cheap, so relying on any "retry" mechanism to fix your issue will cause hourly latency spikes to Bigtable.

What I mentioned above, and what the Google doc link's Golang example suggested, is not a "retry", but "keep creating a new client (and open new channels) and replace the old client to the new client, periodically" - so it is more about "auto client/channel refreshing" rather than any sort of retry.

@liufuyang
Copy link
Owner

And you might want to update the issue title from "Retrying on connection closed" to something like "Auto refreshing client to deal Bigtable server hourly grpc channel close"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants