-
Notifications
You must be signed in to change notification settings - Fork 18k
database/sql: reuses expired connections with strategy= alwaysNewConn #32530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To the best of my understanding, we can resolve the issue by modifying |
This issue also occurs and is seriously exacerbated if the session resetter channel |
CC @kardianos |
I have updated the issue with a more minimal failure case that can be dropped directly into |
I have put together a fix and submitted a PR: #32656 |
Change https://golang.org/cl/182599 mentions this issue: |
@gwax Let's ignore the session resetter issue for now, I will resolve that in the next release. Can you explain when you encounter this issue, and what it looks like for your application when you do? Thanks. |
In our application, with In debugging and tracing the issue, we found the code path is as follows:
Adding an expiry check to the Enabling session resetter hits the same race condition but spikes our error rate. In this case, the resetter was overloaded and we were having |
The PR that I have submitted is fairly large and is, perhaps, clearer to understand as two separate changes (I can resubmit as two PRs, if that makes it easier). The first change is quite large, and refactors The second change is quite small and makes sure that reused sessions always go to the connection pool, not |
@gwax Thank you for the bullet point description of events that lead up to this. Don't resend the PR yet. There is actually a deeper issue I would like to address here I think. The propensity for this connection pool implementation to retain expired connections is a fundamental design issue. Any change we do won't make it into the Go1.13 release (this is on me, I'm sorry). So we are targeting Go1.14 here. Fundamentally, if a driver connection is expired then it should be closed and a new one should be returned, "atomically" within conn. If we factor out the code that opens a new connection at the end of db.conn, then allow it to be called from either expired or normal case, that seems more optimal then worrying about how the connections are represented in a queue (and would touch less code). What do you think of that approach? |
I am happy to live with a Go1.14 target and patch our deployed version of the code. I far prefer a patch that will eventually be in mainline so that we can timebox deploying a fork on our end. Ultimately, I am in favor of any approach that fixes this bug, with a preference for solutions that minimize race conditions and complexity, while maximizing readability. If a small fix is possible, I will always prefer changing fewer lines of code over any other preferences. Ultimately, I don't think fully avoiding expired/broken connections in the pool is possible. A connection could expire at the very moment it is pulled from the pool no matter how aggressively we clean it. That, and, aggressive cleaning comes at a cost. I do, in principle, like the idea, which I think you're proposing, that we could close an expired connection and open a new one all within the same That said, I think we're still likely to run into race conditions and queuing fairness issues if we continue to use a slice for the connection pool and channels for new requests. Every other, smaller, attempt I made to fix this issue ran afoul of something. From a readability and maintainability standpoint, I would really prefer if My PR was written with the goal of having the In the end, I think we want a db.connRequests <- something
select {
case <-ctx.Done():
// bail
case conn, ok := <-db.connectionPool:
// if not ok return db closed
// if good connection and newOrCached: return
// else close, open and return a new one
case conn, ok := <-db.filledRequests:
// if not ok return db closed
// return conn
} If we take a broader approach, I would love to see If there's a plan, or architecture, that you're ok with, I have some time and energy available for implementation. |
@kardianos this issue still exists and is still causing errors for us. Could you provide some attention or direction to this? In the hopes of trying to get a fix into the standard library, I have reorganized and curated the commits in my earlier PR: #32656 The commits now reflect three separate steps in the fixing process:
If there is more that I can do or another approach that you think is better, please provide me with some direction. I am happy to devote time and energy to creating a fix, as long as my fix has a chance of landing. |
Change https://golang.org/cl/216197 mentions this issue: |
@gwax Please try out this CL: https://go-review.googlesource.com/c/go/+/216197 You'll need both in the series, and you should implement the ConnectionDiscarder on the driver. But it should work. |
CL Ready. Waiting for a reviewer. |
Change https://golang.org/cl/242101 mentions this issue: |
Change https://golang.org/cl/242102 mentions this issue: |
Change https://golang.org/cl/242522 mentions this issue: |
Manually backported the subject CLs, because of lack of Gerrit "forge-author" permissions, but also because the prior cherry picks didn't apply cleanly, due to a tight relation chain. The backport comprises of: * CL 174122 * CL 216197 * CL 223963 * CL 216240 * CL 216241 Note: Due to the restrictions that we cannot retroactively introduce API changes to Go1.13.13 that weren't in Go1.13, the Conn.Validator interface (from CL 174122, CL 223963) isn't exposed, and drivers will just be inspected, for if they have an IsValid() bool method implemented. For a description of the content of each CL: * CL 174122: database/sql: process all Session Resets synchronously Adds a new interface, driver.ConnectionValidator, to allow drivers to signal they should not be used again, separatly from the session resetter interface. This is done now that the session reset is done after the connection is put into the connection pool. Previous behavior attempted to run Session Resets in a background worker. This implementation had two problems: untested performance gains for additional complexity, and failures when the pool size exceeded the connection reset channel buffer size. * CL 216197: database/sql: check conn expiry when returning to pool, not when handing it out With the original connection reuse strategy, it was possible that when a new connection was requested, the pool would wait for an an existing connection to return for re-use in a full connection pool, and then it would check if the returned connection was expired. If the returned connection expired while awaiting re-use, it would return an error to the location requestiong the new connection. The existing call sites requesting a new connection was often the last attempt at returning a connection for a query. This would then result in a failed query. This change ensures that we perform the expiry check right before a connection is inserted back in to the connection pool for while requesting a new connection. If requesting a new connection it will no longer fail due to the connection expiring. * CL 216240: database/sql: prevent Tx statement from committing after rollback It was possible for a Tx that was aborted for rollback asynchronously to execute a query after the rollback had completed on the database, which often would auto commit the query outside of the transaction. By W-locking the tx.closemu prior to issuing the rollback connection it ensures any Tx query either fails or finishes on the Tx, and never after the Tx has rolled back. * CL 216241: database/sql: on Tx rollback, retain connection if driver can reset session Previously the Tx would drop the connection after rolling back from a context cancel. Now if the driver can reset the session, keep the connection. * CL 223963 database/sql: add test for Conn.Validator interface This addresses comments made by Russ after https://golang.org/cl/174122 was merged. It addes a test for the connection validator and renames the interface to just "Validator". Updates #31480 Updates #32530 Updates #32942 Updates #34775 Fixes #40205 Change-Id: I6d7307180b0db0bf159130d91161764cf0f18b58 Reviewed-on: https://go-review.googlesource.com/c/go/+/242522 Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Theophanes <kardianos@gmail.com>
Manually backported the subject CLs, because of lack of Gerrit "forge-author" permissions, but also because the prior cherry picks didn't apply cleanly, due to a tight relation chain. The backport comprises of: * CL 174122 * CL 216197 * CL 223963 * CL 216240 * CL 216241 Note: Due to the restrictions that we cannot retroactively introduce API changes to Go1.14.6 that weren't in Go1.14, the Conn.Validator interface (from CL 174122, CL 223963) isn't exposed, and drivers will just be inspected, for if they have an IsValid() bool method implemented. For a description of the content of each CL: * CL 174122: database/sql: process all Session Resets synchronously Adds a new interface, driver.ConnectionValidator, to allow drivers to signal they should not be used again, separatly from the session resetter interface. This is done now that the session reset is done after the connection is put into the connection pool. Previous behavior attempted to run Session Resets in a background worker. This implementation had two problems: untested performance gains for additional complexity, and failures when the pool size exceeded the connection reset channel buffer size. * CL 216197: database/sql: check conn expiry when returning to pool, not when handing it out With the original connection reuse strategy, it was possible that when a new connection was requested, the pool would wait for an an existing connection to return for re-use in a full connection pool, and then it would check if the returned connection was expired. If the returned connection expired while awaiting re-use, it would return an error to the location requestiong the new connection. The existing call sites requesting a new connection was often the last attempt at returning a connection for a query. This would then result in a failed query. This change ensures that we perform the expiry check right before a connection is inserted back in to the connection pool for while requesting a new connection. If requesting a new connection it will no longer fail due to the connection expiring. * CL 216240: database/sql: prevent Tx statement from committing after rollback It was possible for a Tx that was aborted for rollback asynchronously to execute a query after the rollback had completed on the database, which often would auto commit the query outside of the transaction. By W-locking the tx.closemu prior to issuing the rollback connection it ensures any Tx query either fails or finishes on the Tx, and never after the Tx has rolled back. * CL 216241: database/sql: on Tx rollback, retain connection if driver can reset session Previously the Tx would drop the connection after rolling back from a context cancel. Now if the driver can reset the session, keep the connection. * CL 223963 database/sql: add test for Conn.Validator interface This addresses comments made by Russ after https://golang.org/cl/174122 was merged. It addes a test for the connection validator and renames the interface to just "Validator". Updates #31480 Updates #32530 Updates #32942 Updates #34775 Fixes #39101 Change-Id: I043d2d724a367588689fd7d6f3cecb39abeb042c Reviewed-on: https://go-review.googlesource.com/c/go/+/242102 Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Theophanes <kardianos@gmail.com>
Thanks for this, been chasing this in our apps for a while, thought it was a driver error but after adding logging to the driver it was never returning this, so this could be the cause of our issues too. |
Thank for @gwax and @kardianos 's effort to address it out, it helps a lot on trouble shooting my application. |
nb |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
When connections are released, they may bypass the pool and be used to fulfill requests made with
strategy=alwaysNewConn
even if they are "expired" according toSetConnMaxLifetime
, which allows the connections to fail onBeginTx
withErrBadConn
despite passing throughputConn
witherr == nil
The following testcase (for
database/sql_test.go
) exhibits the failure case:with output:
What did you expect to see?
BeginTx
's third attempt withstrategy=alwaysNewConn
should not reuse an existing connection and should definitely not reuse an expired connection.What did you see instead?
BeginTx
's third attempt fails withErrBadConn
due to connection expiration.The text was updated successfully, but these errors were encountered: