sql: improve inverted index validation performance #35742

thoszhang · 2019-03-14T17:17:29Z

To compute the expected number of keys in an inverted index, we currently
generate every key and count the number of unique keys, since array elements
can produce duplicate keys. This PR avoids some of these allocations by doing a
simple count for every non-array JSON type. For a 25GB table of random JSON
values generated by the workload, I got a ~15% speedup on the select sum(crdb_internal.json_num_index_entries(v)) from json.j query.

Release note: None

cockroach-teamcity · 2019-03-14T17:17:37Z

This change is

vivekmenezes · 2019-03-14T18:55:55Z

@lucy-zhang it's probably worth adding a benchmark so that future changes don't cause a regression in performance. Thanks!

To compute the expected number of keys in an inverted index, we currently generate every key and count the number of unique keys, since array elements can produce duplicate keys. This PR avoids some of these allocations by doing a simple count for every non-array JSON type. For a 25GB table of random JSON values generated by the workload, I got a ~15% speedup on the `select sum(crdb_internal.json_num_index_entries(v)) from json.j` query. Release note: None

thoszhang · 2019-05-13T21:48:50Z

I added a benchmark for a JSON object containing mostly scalars and few arrays:

name                           old time/op  new time/op  delta
JSONNumInvertedIndexEntries-8  6.93µs ± 0%  2.92µs ± 1%  -57.91%  (p=0.008 n=5+5)

thoszhang · 2019-05-13T21:51:33Z

bors r+

35742: sql: improve inverted index validation performance r=lucy-zhang a=lucy-zhang To compute the expected number of keys in an inverted index, we currently generate every key and count the number of unique keys, since array elements can produce duplicate keys. This PR avoids some of these allocations by doing a simple count for every non-array JSON type. For a 25GB table of random JSON values generated by the workload, I got a ~15% speedup on the `select sum(crdb_internal.json_num_index_entries(v)) from json.j` query. Release note: None Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>

craig · 2019-05-13T22:15:21Z

Build succeeded

GitHub CI (Cockroach)

thoszhang requested a review from dt March 14, 2019 17:17

dt approved these changes May 13, 2019

View reviewed changes

thoszhang force-pushed the faster-num-entries branch from 9c68d48 to a1fc446 Compare May 13, 2019 21:44

craig bot merged commit a1fc446 into cockroachdb:master May 13, 2019

thoszhang deleted the faster-num-entries branch May 14, 2019 18:30

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: improve inverted index validation performance #35742

sql: improve inverted index validation performance #35742

thoszhang commented Mar 14, 2019

cockroach-teamcity commented Mar 14, 2019

vivekmenezes commented Mar 14, 2019

thoszhang commented May 13, 2019

thoszhang commented May 13, 2019

craig bot commented May 13, 2019

sql: improve inverted index validation performance #35742

sql: improve inverted index validation performance #35742

Conversation

thoszhang commented Mar 14, 2019

cockroach-teamcity commented Mar 14, 2019

vivekmenezes commented Mar 14, 2019

thoszhang commented May 13, 2019

thoszhang commented May 13, 2019

craig bot commented May 13, 2019

Build succeeded