Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decrement the inflight counter on ConnectionRefused #184

Conversation

jgaskins
Copy link
Contributor

@jgaskins jgaskins commented Jul 9, 2023

I decided to take a swing at #169 after adding periodic forced DB failovers (chaos engineering ftw) in addition to occasional actual DB failovers due to real-world things showed that my app would never recover after the failover completed.

After a lot of debugging, I discovered this was only happening when I explicitly set max_pool_size. I noticed the number of DB::ConnectionRefused exceptions I caught happened to be exactly that number before never occurring again for the life of the process, so I started running pp db.pool in a loop and noticed that inflight was stuck at that value despite there being no inflight connections — they were failing instantly because the failover hadn't yet completed, but the @inflight value was still at the same value as @max_pool_size so the pool had no way to know. So it never tried to reconnect after those 30 reconnects because can_increase_pool? could never return truthy, so it would just sit there and wait for a connection to become available, and that never happened because @total was empty and there were no actual in-flight connections.

I just started using this branch in my app and it now works after a database failover.

This is the same problem #170 is trying to resolve, and I mentioned in this comment on that PR that it sounded like the existing retry mechanism wasn't working as intended. I also used the reproduction instructions @robcole posted in #169 and the app recovers immediately with this change.

Fixes #169
Closes #170

@bcardiff
Copy link
Member

bcardiff commented Jul 9, 2023

Nice catch about the inflight getting out of sync when an exception occurs.

But I'm not convinced we should swallow the exception. After inflight is decremented I think is best to rethrow it.

Would that still make your use case to work?

I think that when rethrowing the retry will kick in and ... retry.

The loop in checkout might loop indefinitely if there is a persistent problem with the connection build. That is not good.

@jgaskins
Copy link
Contributor Author

jgaskins commented Jul 9, 2023

But I'm not convinced we should swallow the exception. After inflight is decremented I think is best to rethrow it.

I didn’t put a rescue in, just an ensure, so the exception should still bubble up the stack, right?

@bcardiff
Copy link
Member

bcardiff commented Jul 9, 2023

Oh yeah! my bad. It seems it was to early in the morning!

@bcardiff
Copy link
Member

Let's merge this :-) thanks a log @jgaskins 🏆

I don't discard adding some mechanism to get rid of some connections, as explained in #170 (comment) I'm a bit more inclined to have a TTL at the pool to recycle old resources rather than a healthcheck.

@bcardiff bcardiff merged commit ce95cd2 into crystal-lang:master Jul 10, 2023
@jgaskins jgaskins deleted the decrement-inflight-if-we-cannot-connect-to-server branch July 19, 2023 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection Pool does not check if connections have become disconnected
3 participants