release-20.1: row: fix intra-query memory leaks in kvFetcher and txnKVFetcher #66171

jordanlewis · 2021-06-08T01:52:25Z

Backport 2/2 commits from #65881.

/cc @cockroachdb/release

Updates #64906. The critical change is the first patch, the kvfetcher one.
The kvbatchfetcher one is theoretically good but I haven't found it to make
as large of a difference as the first patch.

With the first patch applied alone, I can no longer cause OOM conditions with
1024 concurrent TPCH query 18s sent at a single machine, which is a major
improvement. Prior to that patch, such a workload would overwhelm the machine
within 1-2 minutes.

This bug was found with help of the new tooling I've been adding to viewcore,
mostly the new pprof output format, the existing html object explorer, and the
new type explorer. You can see these updates at
https://github.com/jordanlewis/debug/tree/crl-stuff.

The KVFetcher is the piece of code that does the first level of decoding
of a KV batch response, doling out slices of keys and values to higher
level code that further decodes the key values into formats that the SQL
engine can operate on.

The KVFetcher uses a slice into the batch response to keep track of
where it is during the decoding process. Once the slice is empty, it's
finished until someone asks it for a new batch.

However, the KVFetcher used to keep around that empty slice pointer for
its lifetime, or until it was asked for a new batch. This causes the
batch response to be un-garbage-collectable, since there is still a
slice pointing at it, even though the slice is empty.

This causes queries to use up to 2x their accounted-for batch memory,
since the memory accounting system assumes that once data is transfered
out of the batch response into the SQL representation, the batch
response is freed - it assumes there's just 1 "copy" of this batch
response memory.

This is especially problematic for long queries (since they will not
allow that KVFetcher memory to be freed until they're finished).

In effect, this causes 1 extra batch per KVFetcher per query to be
retained in memory. This doesn't sound too bad, since a batch is of
fixed size. But the max batch size is 1 megabyte, so with 1024
concurrent queries, each with 3 KVFetchers, like we see in a TPCH
workload with 1024 concurrent query 18s, that's 1024 * 1MB * 3 = 3GB of
unaccounted for memory. This is easily enough memory to push a node over
and cause it to OOM.

This patch nils the batch response pointer once the KVFetcher is
finished decoding it, which allows it to be garbage collected as soon as
possible. In practice, this seems to allow at least a single-node
concurrency-1024 query18 TPCH workload to survive indefinitely (all
queries return out of budget errors) without OOMing.

Release note (bug fix): queries use up to 1MB less actual system memory
per scan, lookup join, index join, zigzag join, or inverted join in
their query plans. This will result in improved memory performance for
workloads with concurrent OLAP-style queries.

Previously, we could leave some dangling references to batch responses
around in the txnKVFetcher when we were fetching more than one batch at
a time. This would cause a delay in reclamation of memory for the
lifetime of a given query.

Release note (bug fix): use less memory in some queries, primarily
lookup joins.

The KVFetcher is the piece of code that does the first level of decoding of a KV batch response, doling out slices of keys and values to higher level code that further decodes the key values into formats that the SQL engine can operate on. The KVFetcher uses a slice into the batch response to keep track of where it is during the decoding process. Once the slice is empty, it's finished until someone asks it for a new batch. However, the KVFetcher used to keep around that empty slice pointer for its lifetime, or until it was asked for a new batch. This causes the batch response to be un-garbage-collectable, since there is still a slice pointing at it, even though the slice is empty. This causes queries to use up to 2x their accounted-for batch memory, since the memory accounting system assumes that once data is transfered out of the batch response into the SQL representation, the batch response is freed - it assumes there's just 1 "copy" of this batch response memory. This is especially problematic for long queries (since they will not allow that KVFetcher memory to be freed until they're finished). In effect, this causes 1 extra batch per KVFetcher per query to be retained in memory. This doesn't sound too bad, since a batch is of fixed size. But the max batch size is 1 megabyte, so with 1024 concurrent queries, each with 3 KVFetchers, like we see in a TPCH workload with 1024 concurrent query 18s, that's 1024 * 1MB * 3 = 3GB of unaccounted for memory. This is easily enough memory to push a node over and cause it to OOM. This patch nils the batch response pointer once the KVFetcher is finished decoding it, which allows it to be garbage collected as soon as possible. In practice, this seems to allow at least a single-node concurrency-1024 query18 TPCH workload to survive indefinitely (all queries return out of budget errors) without OOMing. Release note (bug fix): queries use up to 1MB less actual system memory per scan, lookup join, index join, zigzag join, or inverted join in their query plans. This will result in improved memory performance for workloads with concurrent OLAP-style queries.

Previously, we could leave some dangling references to batch responses around in the txnKVFetcher when we were fetching more than one batch at a time. This would cause a delay in reclamation of memory for the lifetime of a given query. Release note (bug fix): use less memory in some queries, primarily lookup joins.

cockroach-teamcity · 2021-06-08T01:53:32Z

This change is

rytaft

Reviewed 1 of 1 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

jordanlewis added 2 commits June 7, 2021 22:51

jordanlewis requested a review from a team June 8, 2021 01:52

rytaft approved these changes Jun 8, 2021

View reviewed changes

jordanlewis merged commit 350d70a into cockroachdb:release-20.1 Jun 8, 2021

jordanlewis deleted the backport20.1-65881 branch June 18, 2021 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-20.1: row: fix intra-query memory leaks in kvFetcher and txnKVFetcher #66171

release-20.1: row: fix intra-query memory leaks in kvFetcher and txnKVFetcher #66171

jordanlewis commented Jun 8, 2021

cockroach-teamcity commented Jun 8, 2021

rytaft left a comment

release-20.1: row: fix intra-query memory leaks in kvFetcher and txnKVFetcher #66171

release-20.1: row: fix intra-query memory leaks in kvFetcher and txnKVFetcher #66171

Conversation

jordanlewis commented Jun 8, 2021

cockroach-teamcity commented Jun 8, 2021

rytaft left a comment

Choose a reason for hiding this comment