Skip to content

Commit

Permalink
[wip] engine: add byte limit to pebbleMVCCScanner
Browse files Browse the repository at this point in the history
TODO: due to the RocksDB/pebble modes of operation, there are
currently two ways to do anything. Thanks to the two possible
response types for batches (KVs vs bytes) there is another
factor of two. In short, I expect to have to make three more
similar changes to fully be able to implement byte hints for
scans.
TODO: testing and potentially write the code in a saner way.

----

A fledgling step towards cockroachdb#19721 is allowing incoming KV requests to
bound the size of the response in terms of bytes rather than rows.
This commit provides the necessary functionality to pebbleMVCCScanner:
The new maxBytes field stops the scan once the size of the result
meets or exceeds the threshold (at least one key will be added,
regardless of its size).

The classic example of the problem this addresses is a table in which
each row is, say, ~1mb in size. A full table scan will currently fetch
data from KV in batches of [10k], causing at least 10GB of data held in
memory at any given moment. This sort of thing does happen in practice;
we have a long-failing roachtest cockroachdb#33660 because of just that, and
anecdotally OOMs in production clusters are with regularity caused by
individual queries consuming excessive amounts of memory at the KV
level.

Plumbing this limit into a header field on BatchRequest and down to the
engine level will allow the batching in [10k] to become byte-sized in
nature, thus avoiding one obvious source OOMs. This doesn't solve cockroachdb#19721
in general (many such requests could work together to consume a lot of
memory after all), but it's a sane baby step that might just avoid a
large portion of OOMs already.

[10k]: https://github.com/cockroachdb/cockroach/blob/0a658c19cd164e7c021eaff7f73db173f0650e8c/pkg/sql/row/kv_batch_fetcher.go#L25-L29

Release note: None
  • Loading branch information
tbg committed Jan 24, 2020
1 parent c8c56af commit 4a8d690
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions pkg/storage/engine/pebble_mvcc_scanner.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ const (
// expected by MVCCScanDecodeKeyValue.
type pebbleResults struct {
count int64
bytes int64
repr []byte
bufs [][]byte
}
Expand Down Expand Up @@ -76,6 +77,7 @@ func (p *pebbleResults) put(key MVCCKey, value []byte) {
encodeKeyToBuf(p.repr[startIdx+kvLenSize:startIdx+kvLenSize+lenKey], key, lenKey)
copy(p.repr[startIdx+kvLenSize+lenKey:], value)
p.count++
p.bytes += int64(lenToAdd)
}

func (p *pebbleResults) finish() [][]byte {
Expand All @@ -98,6 +100,9 @@ type pebbleMVCCScanner struct {
ts hlc.Timestamp
// Max number of keys to return.
maxKeys int64
// Stop adding keys once p.result.bytes matches or exceeds this threshold,
// if nonzero.
maxBytes int64
// Transaction epoch and sequence number.
txn *roachpb.Transaction
txnEpoch enginepb.TxnEpoch
Expand Down Expand Up @@ -524,6 +529,13 @@ func (p *pebbleMVCCScanner) addAndAdvance(val []byte) bool {
// to include tombstones in the results.
if len(val) > 0 || p.tombstones {
p.results.put(p.curMVCCKey(), val)
if p.maxBytes > 0 && p.results.bytes >= p.maxBytes {
// Hacky way to implement maxBytes: once we're above threshold,
// pretend we reached maxKeys. (The resume key computation requires
// fetching an additional key after that, so trying to exit the
// iterations based on size alone is not trivial).
p.maxKeys = p.results.count
}
if p.results.count == p.maxKeys {
return false
}
Expand Down

0 comments on commit 4a8d690

Please sign in to comment.