-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-20.2: sql/pgwire: fix statement buffer memory leak when using suspended portals #67370
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
It looks like you may need to add this testutil function: cockroach/pkg/testutils/pgtest/pgtest.go Line 99 in 36ed61e
Also, the comment I left on the 21.1 backport PR holds here: I believe the test will fail because it depends on the change in #63677, but this change was not backported. Is it possible for the test to be updated to explicitly close the portal? |
ed73be3
to
2cd1ebd
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Explicitly closed the portal and back ported test utils. |
@@ -141,7 +141,7 @@ func RunTest(t *testing.T, path, addr, user string) { | |||
} | |||
} | |||
|
|||
func parseMessages(s string) []pgproto3.BackendMessage { | |||
func ParseMessages(s string) []pgproto3.BackendMessage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will also need to add comments to these functions now that they are exported (in order to pass linting checks)
// ParseMessages parses a string containing multiple pgproto3 messages separated
// by the newline symbol. See testdata for examples ("until" or "receive"
// commands).
// MsgsToJSONWithIgnore converts the pgproto3 messages to JSON format. The
// second argument can specify how to adjust the messages (e.g. to make them
// more deterministic) if needed, see testdata for examples.
…tals The connection statement buffer grows indefinitely when the client uses the execute portal with limit feature of the Postgres protocol, eventually causing the node to crash out of memory. Any long running query that uses the limit feature will cause this memory leak such as the `EXPERIMENTAL CHANGEFEED FOR` statement. The execute portal with limit feature of the Postgres protocol is used by the JDBC Postgres driver to fetch a limited number of rows at a time. The leak is caused by commands accumulating in the buffer and never getting cleared out. The client sends 2 commands every time it wants to receive more rows: - `Execute {"Portal": "C_1", "MaxRows": 1}` - `Sync` The server processes the commands and leaves them in the buffer, every iteration causes 2 more commands to leak. A similar memory leak was fixed by cockroachdb#48859, however the execute with limit feature is implemented by a side state machine in limitedCommandResult. The cleanup routine added by cockroachdb#48859 is never executed for suspended portals as they never return to the main conn_executor loop. After this change the statement buffer gets trimmed to reclaim memory after each client command is processed in the limitedCommandResult side state machine. The StmtBuf.Ltrim function was changed to be public visibility to enable this. While this is not ideal, it does scope the fix to the limitedCommandResult side state machine and could be addressed when the limitedCommandResult functionality is refactored into the conn_executor. Added a unit test which causes the leak, used the PGWire client in the test as neither the pg or pgx clients use execute with limit, so cant be used to demonstrate the leak. Also tested the fix in a cluster by following the steps outlined in cockroachdb#66849. Resolves: cockroachdb#66849 See also: cockroachdb#48859 Release note (bug fix): fix statement buffer memory leak when using suspended portals
2cd1ebd
to
a0fb94e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! thanks
Backport 1/1 commits from #67135.
/cc @cockroachdb/release
The connection statement buffer grows indefinitely when the client uses the
execute portal with limit feature of the Postgres protocol, eventually causing
the node to crash out of memory. Any long running query that uses the limit
feature will cause this memory leak such as the
EXPERIMENTAL CHANGEFEED FOR
statement. The execute portal with limit feature of the Postgres protocol is
used by the JDBC Postgres driver to fetch a limited number of rows at a time.
The leak is caused by commands accumulating in the buffer and never getting
cleared out. The client sends 2 commands every time it wants to receive more
rows:
Execute {"Portal": "C_1", "MaxRows": 1}
Sync
The server processes the commands and leaves them in the buffer, every
iteration causes 2 more commands to leak.
A similar memory leak was fixed by #48859, however the execute with limit
feature is implemented by a side state machine in limitedCommandResult. The
cleanup routine added by #48859 is never executed for suspended portals as
they never return to the main conn_executor loop.
After this change the statement buffer gets trimmed to reclaim memory after
each client command is processed in the limitedCommandResult side state
machine. The StmtBuf.Ltrim function was changed to be public visibility to
enable this. While this is not ideal, it does scope the fix to the
limitedCommandResult side state machine and could be addressed when the
limitedCommandResult functionality is refactored into the conn_executor.
Added a unit test which causes the leak, used the PGWire client in the test as
neither the pg or pgx clients use execute with limit, so cant be used to
demonstrate the leak. Also tested the fix in a cluster by following the steps
outlined in #66849.
Resolves: #66849
See also: #48859
Release note (bug fix): fix statement buffer memory leak when using suspended
portals.
@rafiss